matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.87k stars 349 forks source link

Paginate and limit based on number of pages #311

Closed Globerada closed 5 years ago

Globerada commented 6 years ago

Hi. I have not found this information in the docs.

How can I achieve a paginate based on the numbers of pages that the URL have? Below is the example that I am using. Instead of a high limit so I can crawl all the pages, how can I put a valid limit base on the real number of pages?

x('http://www.example.com/products', 'div.products_details_container', data) .paginate('.pagination a:last-Child@href') .limit(999) .write('results.json');

lathropd commented 5 years ago

The current approach is not to use a limit, but create your selector such that it will stop once you run out... which yours probably should.