nrabinowitz / pjscrape

A web-scraping framework written in Javascript, using PhantomJS and jQuery
http://nrabinowitz.github.io/pjscrape/
MIT License
996 stars 159 forks source link

Implement dynamic next page functionality. #27

Open zejn opened 12 years ago

zejn commented 12 years ago

Sometimes a page has POST based navigation (yes, sadly this happens) and there isn't an URL where you could point pjscrape to go. In this case you need to start at a specific URL and navigate by issuing click events on certain DOM elements to get to the desired page. And this is something a scraping tool such as pjscrape, which runs in the browser, can really do well.

This pull request implements this functionality via two properties on pjs.suite. The "nextPage" is a function which determines and triggers an event, guiding browser to next page. It returns true when next page was requested and false otherwise. The other property is "maxDepth" which determines how many times can next page be requested (useful for example for scraping POST based pagination, but no more than N pages). Included test demonstrates the functionality.

zejn commented 11 years ago

Ping? Comments?