gosom / google-maps-scraper

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place
MIT License
983 stars 135 forks source link

Thank you Georgios! #52

Open lexciobotariu opened 6 months ago

lexciobotariu commented 6 months ago

I've been running the script with 5k queries for the last 10h and it got to the level where it is using over 200GB RAM and I've set it to use 35 cores.

It scraped over 300k businesses. image

I'm just a bit worried that it won't finish the entire list of queries before crashing due to the lack of RAM. Any suggestions on how to continue the scraping once it crashes, and to retake from where it left?

admbyz commented 6 months ago

try less cores or you can split your keywords and run synchronously i have shared a script on this closed issue https://github.com/gosom/google-maps-scraper/issues/35 and make sure you are running latest version.

gosom commented 6 months ago

@lexciobotariu what is the outcome of this? Have you managed to scrape all your keywords?

lexciobotariu commented 6 months ago

Hello there, it did manage to scrap all the information ~500k. @admbyz I did use your suggestion in the past, was working perfectly.

admbyz commented 6 months ago

eh i misunderstood your problem. but your request seems really hard because i dont think google sends static results with requests you do. So able to resume program also need to validate returned data from google. Skipping already scraped data is more performant sure but at the end total request will be same unless only checking exact url and skip the entire results. I didnt pay attention to terminal prints but maybe you can extract data from there make a new or remove already passed data from your keyword list and check if scraper not running and keywordlist is not empty then run again. But before that you need to check is scraper append results or removes and append new result to file after restarting. If its not appending you have to make new result file every restart. I am not recommend this way to handle scraping its wonky and not reliable.. Running scraper with less cores will be best bet for you i guess.