propublica / upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
MIT License
1.61k stars 112 forks source link

Added Utils.resolve_url and some tests #14

Closed dannguyen closed 11 years ago

dannguyen commented 11 years ago

Added a way to deal with relative urls and a slightly-altered test in the upton_spec suite to show that it works. However, I think in the long run, some changes will have to be made to the Upton API, which currently assumes a single kind of @index_url (as an instance variable created at initialization).

There may be some extreme edge cases that resolve_url doesn't work on. Compare its shortness versus the version by Mechanize:

https://github.com/sparklemotion/mechanize/blob/5ac3970a1b1475e8325bbf29b54abf0f7b3ecb60/lib/mechanize/http/agent.rb

jeremybmerrill commented 11 years ago

Looks awesome. I want to take a closer look, but I expect to merge this soon.

Also, as I said, I'm not an RSpec expert, by any means. Is what you've got in spec/upton_spec.rb in 4397c28 the idiomatic RSpec way to write as-yet-unwritten tests? (e.g. just it "should do whatever" without a following block).

dannguyen commented 11 years ago

That seems to be the easiest way. If you want to leave more explanation why, you can also throw in the block with a pending

  it 'has to do this' do 
    pending 'for some reason'
  end
dannguyen commented 11 years ago

Also, hopefully the test suite won't require anyone here to be an RSpec framework ;). I switched to it from MiniTest...the syntax is only slightly different and it provides a lot of nice conveniences, including at the commandline (running single test files, or single tests, or testnames that match a regex) and apparently is simpler to mock things with out of the box (never have used many Mocks myself though). I don't think the speed advantage of MiniTest will matter much given the relatively few tests that this should all need

kgrz commented 11 years ago

@dannguyen Major plus is the availability of plugins I suppose. And there would definitely be more resources for rspec than MiniTest/MiniTestSpec. :)

jeremybmerrill commented 11 years ago

Merged just now, for 0.2.7.