johntitus / node-horseman

Run PhantomJS from Node
MIT License
1.45k stars 124 forks source link

So many 'failed to GET url' #323

Open minotaurrr opened 6 years ago

minotaurrr commented 6 years ago

I'm just doing horseman.open('https://www.google.com') for testing but getting sooo many failed to get URL just at random times - maybe about 7 out of 10 times it'll fail.

any idea why?

nelsonwittwer commented 6 years ago

Kicked the tires for this library following the docs for the project and saw a similar thing. Both Twitter and Google examples failed to run.

horseman v3.3.0 node v 8.9.1

minotaurrr commented 6 years ago

Tried on multiple hosts, and did notice that frequencies vary. But still getting the same error at some point evenutially

grohsfabian commented 6 years ago

Up to this topic, same happening to me

NoelDavies commented 6 years ago

Up to this, I'm getting it repeatedly, not can I catch them

t0ursene commented 6 years ago

minotaurrr, Google detects scrapper and banned your IP address very quickly. That's mean you can only "horseman.open('http://google.com') " ONCE every 5 minutes. If you want to scrap it more than 1 time per 5 minutes, you need to :

jorgerosal commented 6 years ago

Google must have banned your IP. Set the time interval between GET request OR set a list of proxy and cycle through randomly.