gosom / google-maps-scraper

scrape data data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place
MIT License
724 stars 93 forks source link

Problems connecting script with AWS RDS #30

Closed LL-AI-dev closed 4 months ago

LL-AI-dev commented 4 months ago

Running this repo with a psql server on AWS RDS hangs after about 5 jobs have been completed. Even before it hangs, the process is significantly slower (much more than expected) than compared to doing it with a psql server on localhost.

While I can work around this limitation, it would be nice if the code could directly export to the remotely hosted database server. ========
Update: Original errors was due to a simple mistake in usage. When working with a database, the -email flag should only be used when executing the jobs already populated in the gmaps_jobs table. Having the flag on when creating the jobs results in errors.

The correct usage is:

#Add the jobs to the queue in the database table: gmaps_jobs
go run main.go \
    -dsn $DSN \
    -produce \
    -input example-queries.txt \
    -lang en

#execute the jobs in the queue
go run main.go \
    -c 3  \
    -depth 3 \
    -dsn $DSN \
    -email


Everything below this is now irrelevant to the issue.


When trying to use this repo in conjuction with AWS RDS to host a PostgreSQL server, I encounter some errors.

To setup the RDS database, the "gmaps_jobs" table was made using the create_tables.up.sql script, and I manually made the "results" table with 2 columns:

  • id : integer : primary_key & not_null
  • data : jsonb : not_null ~~


Running the following code to queue the jobs. It fills the gmaps_jobs table as expected.

export DSN="postgres://postgres:postgres@[aws-endpoint]:5432/postgres" \

#Add the jobs to the queue in the database table: gmaps_jobs
go run main.go \
    -dsn $DSN \
    -produce \
    -input example-queries.txt \
    -email


However when running the 2nd part,

#execute the jobs in the queue
go run main.go \
    -c 3  \
    -depth 3 \
    -dsn $DSN

there are a lot of lines in the logging which state:

{"level":"error","component":"scrapemate","error":"invalid job type: while pushing jobs","time":"2024-02-12T01:05:50.189031649Z","message":"error while finishing job"}


Then the script exits with one of two errors: Either

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x8625de]

OR

ERROR: null value in column "id" of relation "results" violates not-null constraint (SQLSTATE 23502)

(as a 3rd case, sometimes the script just hangs after one of the above "invalid job type" errors)

Do you have any suggestions as to what is causing this?


========================
Edit: When running the code locally and using a psql server on localhost, the code successfully completes, but the logs show a LOT of the gmaps jobs failing

{"level":"info","component":"scrapemate","numOfJobsCompleted":84,"numOfJobsFailed":75,"lastActivityAt":"2024-02-12T04:23:45.97806321Z","speed":"28.00 jobs/min","time":"2024-02-12T04:23:49.053938604Z","message":"scrapemate stats"}

However, if the code is run in a docker container and outputs the results to .csv then all of the jobs successfully run without any failing.

gosom commented 4 months ago

So, did you managed to make it work?

LL-AI-dev commented 4 months ago

Sorry for the confusing post. Despite it hanging when used with AWS RDS yesterday, today it is working fine.

Closing post because there is nothing to fix.