propublica / upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
MIT License
1.62k stars 113 forks source link

Implement @pagination_interval #34

Closed caseycesari closed 10 years ago

caseycesari commented 10 years ago

I'm currently working on a scraping a website (sorry, can't share it at the moment) that handles pagination more like offset in SQL than using page numbers. Each page displays 20 instances. The first page is start=1, the second page is start=21, the third is start=41, etc. This pull request allows the user to ability to specify a number other than 1 that should be added to pagination_index to get the next page. The default value of pagination_interval is 1. For this site, setting up Upton to handle pagination would like this:

scraper.paginated = true
scraper.pagination_param = 'start'
scraper.pagination_interval = 20
scraper.pagination_max_pages = 12001
jeremybmerrill commented 10 years ago

Awesome awesome awesome, thanks!