propublica / upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
MIT License
1.62k stars 113 forks source link

Handle pagination out-of-the-box #17

Closed bxjx closed 10 years ago

bxjx commented 11 years ago

It would be nice if upton handled common implementations of pagination with minimal configuration.

As the docs point out, you've already made it super easy to handle paginated indexes by overriding next_index_page_url, but I think it could be nice to have it implemented neatly as part of the library. It could maybe be enabled with an instance variable like propubscraper.paginate = true. There could possibly be other options to set the query string parameter name (by default use page or p) and to set the maximum number of results to scrape.

I'm happy to give you a pull request if you think it's worth doing. Thanks for the useful gem btw!

jeremybmerrill commented 11 years ago

Hey @bxjx, that sounds awesome. I'm not sure exactly how to implement that, but I'd love to hear your proposed solution and would definitely accept a pull request.

jeremybmerrill commented 10 years ago

Closed ages ago.