matthewmueller / x-ray

The next web scraper. See through the <html> noise.
MIT License
5.88k stars 350 forks source link

404 & 500 url validation #178

Closed JephMarketing closed 5 years ago

JephMarketing commented 8 years ago

Subject of the issue

I am crawling and pulling emails from numerous sites, however, when I add a broken link to my array of urls to crawl I break my script. I am curious if there is anything built in to handle 404 and 505.

Your environment

node 4.4.3 npm 2.15.1 (comes with node.js download)

Expected behaviour

Right now if all links are valid in the import .csv file then it works beautifully, if not I get the following error: { [Error: getaddrinfo ENOTFOUND -------------.com -------------.com:80] code: 'ENOTFOUND', errno: 'ENOTFOUND', syscall: 'getaddrinfo', hostname: '--------.com', host: '------------.com', port: 80, response: undefined }

I am sorry if I missed this in the docs or if it is an easy question, but I really am trying to find a simple and efficient solution. Thanks in advance for your help!

lathropd commented 5 years ago

You’ve probably dealt with this another way by now. Sorry we never got back to you. Closing for now as stale. If you have example code you can share, it would still be helpful to see.