ruipgil / scraperjs

A complete and versatile web scraper.
MIT License
3.71k stars 188 forks source link

Using Asynchronous ScraperFn #7

Closed ksloan closed 10 years ago

ksloan commented 10 years ago

Is there any way to use an Asynchronous function as the ScrapeFn ? I have a url where I need to set an interval to load extra data into the DOM before I actually do any scraping, and then when a certain condition is met, I do that actual scrape.

The examples show a return statement, but is there any way to do this with a callback? Thanks!

ruipgil commented 10 years ago

Could you provide an example?

ksloan commented 10 years ago

For example on this page there is a 'load more' button to get all the events to show in the DOM before I can scrape them. So I've set up an interval to check the status of the load more button, and click it if needed like so

if ($('.load_more_link').length > 0) {

    var clicker = setInterval(function() {
        $('.load_more_link').click()

        if ($('.load_more_link').length == 0) {
            clearInterval(clicker) // done here
        };

    }, 1000)

}

Then once it's done, I want to return the total number of events along with some other info... but I don't see how I can return any info once the function becomes asynchronous.

cjackie commented 10 years ago

This is a common pattern. phantomjs does have the functionality to trigger events after a page has been loaded. Maybe we can add it into ScraperPromise.js? so that we have something like:

var scraperjs = require('scraperjs');
scraperjs.DynamicScraper.create('https://news.ycombinator.com/')
             .triggerEvent(function(){
                 //trigger events
              });

But, since phantomjs is sandboxed, we need to think of a way to tell the promise that event has finished. set a waiting time is one way (poor one).

ruipgil commented 10 years ago

@ksloan , for that kind of sites I find that it is better just to inspect the ajax calls and work from there. You could also try to to use the delay promise between two scrape promises. @cjackie , I might look into that sometime in the future.

ruipgil commented 10 years ago

The v0.3.0 has an async promise, you can see the promise which allows to check for events and then trigger them.

ksloan commented 10 years ago

Amazing!! Thank you!

emgould commented 9 years ago

Can you provide example using aync?