hellock / icrawler

A multi-thread crawler framework with many builtin image crawlers provided.
http://icrawler.readthedocs.io/en/latest/
MIT License
854 stars 174 forks source link

Tips for IP not getting banned by google #27

Closed gajewsk2 closed 7 years ago

gajewsk2 commented 7 years ago

Just curious on if people have their IP blacklisted by google by using this and tips to avoid it, eg max number of requests in X amount of time? What measures does the crawler already take in this regard?

hellock commented 7 years ago

Currently there is not any limitation for request frequency since getting banned by Google is not common if you are not very aggressive. You can hack the feeder or parser to add some delay between two requests.

Some measures for this issue.

gajewsk2 commented 7 years ago

That link to proxy documentation seems to be corrupted

hellock commented 7 years ago

Fixed.