gosom / google-maps-scraper

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place
MIT License
960 stars 131 forks source link

Guide to Running Web Interface on AWS EC2 #87

Open mattyjacks opened 1 month ago

mattyjacks commented 1 month ago

I managed to get this thing working via AWS EC2! YAY! I decided to write a little guide on it.

First thing is you launch an Ubuntu Server 24.04 LTS instance, I use a t2.xlarge ($0.18 per hour) (you can turn it off when you're not using it to save money) with 25 GB of storage. guide to creating google web scraper instance

Then you connect to the instance. Using EC2 Instance Connect with default username is fine.

Here are the commands you have to run:

git clone https://github.com/gosom/google-maps-scraper.git

sudo apt install golang-go

sudo apt-get update

sudo apt install golang-go

sudo apt-get install libatk1.0-0 libatk-bridge2.0-0 libcups2 libatspi2.0-0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libpango-1.0-0 libcairo2 liboss4-salsa-asound2

sudo apt-get install liboss4-salsa-asound2

sudo apt-get update

sudo apt-get upgrade

sudo apt install nodejs npm

sudo npm install -g playwright

sudo apt-get install libasound2 libasound2-plugins

rm -rf ~/.cache/ms-playwright

playwright install

sudo npx playwright install-deps

uname -m

npx playwright install firefox

npx playwright install webkit

cd google-maps-scraper

go mod download

go build

(Adjust the number after -c depending on the number of cores your EC2 instance has, 1 less than the number of cores you have, the EC2 Instance I chose has 4 cores)

./google-maps-scraper -web -c 3

Edit inbound security group rules of the EC2 Instance to allow 8080 port range from anywhere

aws ec2 edit security group rules

Visit the port 8080 of the public IP address of the EC2, like 54.147.206.100:8080 , be sure to use HTTP instead of HTTPS or it won't connect

aws ec2 running scraper

Above is what the scraper looks like in action.

THANK YOU @gosom FOR YOUR WONDERFUL TOOL!

gosom commented 1 month ago

@mattyjacks it is nice that it works for you but I have a few points here:

(1) The webapp is NOT DESIGNED (at the moment ) to be publicly available for security purposes. I HIGHLY recommend you IMMEDIATELY allow ONLY your ip to access the tool until an authentication system is in place.

(2) I think it's easier to run it via a docker container.

Thank you very much for trying this into AWS

mattyjacks commented 1 month ago

Thank you for the quick response.

1: I'll be shutting down the tool as soon as this scrape-job is finished (to save money), and when I revive it a new IP address will be assigned from Amazon anyways. I wasn't planning on sharing the IP address that would let others access it.

In response to 2: Yeah, probably. I've never used docker before, tho.

I'm overall very satisfied with the result. One huge advantage of the AWS EC2 approach is it's not tying my IP address to the scraping activity in Google's eyes. Pretty paranoid about getting banned from Google.

gosom commented 1 month ago

Even if you do not sharing the IP this is still not safe. People might break into your server.

I recommend in the firewall just to allow connections from your IP address.

Additionally, you might consider using proxies if you want to mask your IP address.

In any case the tool is for educational purposes only.