m-wrzr / populartimes

MIT License
824 stars 167 forks source link

Scraper - IP Blocking #88

Closed jerryatt closed 2 years ago

jerryatt commented 4 years ago

Hi,

For the crawler - since we are scraping popular hours, Is it possible that google.de blocks our IP if we ping too many place_id's in a short span of time? I see we have a User Agent specified but has this been observed by other users and is there is a way to ensure this does not happen?

Thanks

lbedogni commented 4 years ago

I did not experienced this. As you are using an API_KEY I'd say there's no problem, because you are "limited" by the API cost.

jerryatt commented 4 years ago

Thanks Ibedogni, As I look through the code, yes the first ping is to the API with the place ID but the subsequent popular hours scraping is a web scraping function. Have you done this at scale for a number of Place ID's (say 100k+)? i have not faced this issue for a smaller number of records but I am not sure if it will happen for hundreds of records pinged in a distributed fashion..