ibrod83 / nodejs-web-scraper

81 stars 26 forks source link

Error Callback #23

Closed lachlansleight closed 1 year ago

lachlansleight commented 2 years ago

Hey!

Thanks again for this awesome tool, I use it all the time and it works amazingly.

I have a request for a feature addition - presently, the only way to determine whether an error occurred during a scrape is (as far as I can tell) to examine the output finalErrors.json file. I often use this library in serverless functions to deal with sites that don't have public APIs. In these instances, I'm not able to save or retrieve local files.

It would be very useful if, as part of the scraper callback, I could pass in an error handling function so as to know what went wrong. I imagine the API looking something like this:

const scraper = new Scraper({
    baseSiteUrl: `http://example.com`,
    startUrl: `http://example.com`,
    onError: errorString => console.error("scraping failed: ", errorString);
});

and it being implemented in Scraper.js, in the reportFailedScrapingAction function like so:

reportFailedScrapingAction(errorString) {
    this.state.failedScrapingIterations.push(errorString);
    if(this.config.onError) this.config.onError(errorString);
}

It would be up to the library user to deal with errors - whether to wait for retries, to manage how many there have been, etc, but this small change would be extremely useful.

(my specific issue right now is that my API endpoint just times out if I get a 404 error for my scraping URL, since there's no way to detect that the scraping failed right now and return an error to my endpoint requester)

ibrod83 commented 2 years ago

Hey. To be honest I don't really work on this package anymore, except for fixing bugs or updating dependencies. Maybe you could do this change yourself and create a pull request? I have a very extensive test suite(external, not part of the package), so I could test your changes and see if anything was broken.

lachlansleight commented 2 years ago

Done: #24 :)

ibrod83 commented 2 years ago

I'll check it out during this weekend. Thanx!

lachlansleight commented 2 years ago

Heya, did you end up taking a look at this? It's a pretty minor change and an optional feature - it'd be nice to be able to stop using my own weird local copy of the project :P