Closed lachlansleight closed 1 year ago
Hey. To be honest I don't really work on this package anymore, except for fixing bugs or updating dependencies. Maybe you could do this change yourself and create a pull request? I have a very extensive test suite(external, not part of the package), so I could test your changes and see if anything was broken.
Done: #24 :)
I'll check it out during this weekend. Thanx!
Heya, did you end up taking a look at this? It's a pretty minor change and an optional feature - it'd be nice to be able to stop using my own weird local copy of the project :P
Hey!
Thanks again for this awesome tool, I use it all the time and it works amazingly.
I have a request for a feature addition - presently, the only way to determine whether an error occurred during a scrape is (as far as I can tell) to examine the output finalErrors.json file. I often use this library in serverless functions to deal with sites that don't have public APIs. In these instances, I'm not able to save or retrieve local files.
It would be very useful if, as part of the scraper callback, I could pass in an error handling function so as to know what went wrong. I imagine the API looking something like this:
and it being implemented in Scraper.js, in the
reportFailedScrapingAction
function like so:It would be up to the library user to deal with errors - whether to wait for retries, to manage how many there have been, etc, but this small change would be extremely useful.
(my specific issue right now is that my API endpoint just times out if I get a 404 error for my scraping URL, since there's no way to detect that the scraping failed right now and return an error to my endpoint requester)