omkarcloud / google-maps-scraper

👋 HOLA 👋 HOLA 👋 HOLA ! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, REVIEWS, WEBSITES, AND RATINGS FROM GOOGLE MAPS WITH EASE! 🤖
https://www.omkar.cloud/
MIT License
889 stars 219 forks source link

Can i run this script on linux #31

Closed babynew closed 10 months ago

babynew commented 10 months ago

Awesome script working in windows . Can i run this script on linux ? If yes how can i do?

While running in docker i got these errors Traceback (most recent call last): File "main.py", line 16, in launch_tasks(*tasks_to_be_run) File "/usr/local/lib/python3.8/site-packages/bose/launch_tasks.py", line 54, in launch_tasks current_output = task.begin_task(current_data, task_config) File "/usr/local/lib/python3.8/site-packages/bose/base_task.py", line 219, in begin_task final = run_task(False, 0) File "/usr/local/lib/python3.8/site-packages/bose/base_task.py", line 160, in run_task create_directories(self.task_path) File "/usr/local/lib/python3.8/site-packages/bose/base_task.py", line 104, in create_directories _download_driver() File "/usr/local/lib/python3.8/site-packages/bose/base_task.py", line 34, in _download_driver download_driver() File "/usr/local/lib/python3.8/site-packages/bose/download_driver.py", line 47, in download_driver major_version = get_major_version(get_chrome_version()) File "/usr/local/lib/python3.8/site-packages/chromedriver_autoinstaller_fix/init.py", line 41, in get_chrome_version return utils.get_chrome_version() File "/usr/local/lib/python3.8/site-packages/chromedriver_autoinstaller_fix/utils.py", line 140, in get_chrome_version path = get_linux_executable_path() File "/usr/local/lib/python3.8/site-packages/chromedriver_autoinstaller_fix/utils.py", line 204, in get_linux_executable_path raise ValueError("No chrome executable found on PATH") ValueError: No chrome executable found on PATH

Chetan11-dev commented 10 months ago

Hello babynew, Regrettably, due to my tight schedule, I'm unable to assist you with your issue.

I encourage you to attempt resolving the problem on your own and, if successful, kindly share your solution in this thread. It would greatly benefit the community.

Thank you for your understanding.

fabioselau077 commented 10 months ago

Same here... in macos its good, working, but in linux ubuntu (aws ec2) return this same error... From what I've read you need to specify the driver's path directly in selenium's chrome, but I didn't find this option in the code, I believe that who sets this path is bose

resolved https://github.com/omkarcloud/google-maps-scraper/discussions/33

babidi34 commented 10 months ago

@babynew for resolv your issue you should install chromium, on ubuntu : apt install chromium

babynew commented 10 months ago

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/bose/base_task.py", line 190, in run_task result = self.run(driver, data) File "/app/google-maps-scraper/src/scrape_google_maps_links_task.py", line 251, in run links = get_links() File "/app/google-maps-scraper/src/scrape_google_maps_links_task.py", line 230, in get_links should_exit, result = scroll_till_end(1) File "/app/google-maps-scraper/src/scrape_google_maps_links_task.py", line 195, in scroll_till_end rst = [driver.current_url] File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 528, in current_url return self.execute(Command.GET_CURRENT_URL)['value'] File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id Stacktrace:

0 0x55be35900e23

1 0x55be356295f6

2 0x55be3565c8dd

3 0x55be3565d81e

4 0x55be358c2638

5 0x55be358c6507

6 0x55be358d0c4c

7 0x55be358c7136

8 0x55be358959cf

9 0x55be358eab98

10 0x55be358ead68

11 0x55be358f9cb3

12 0x7f911c9d9b43

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/bose/base_task.py", line 190, in run_task result = self.run(driver, data) File "/app/google-maps-scraper/src/scrape_google_maps_links_task.py", line 251, in run links = get_links() File "/app/google-maps-scraper/src/scrape_google_maps_links_task.py", line 230, in get_links should_exit, result = scroll_till_end(1) File "/app/google-maps-scraper/src/scrape_google_maps_links_task.py", line 195, in scroll_till_end rst = [driver.current_url] File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 528, in current_url return self.execute(Command.GET_CURRENT_URL)['value'] File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id Stacktrace:

0 0x55be35900e23

1 0x55be356295f6

2 0x55be3565c8dd

3 0x55be3565d81e

4 0x55be358c2638

5 0x55be358c6507

6 0x55be358d0c4c

7 0x55be358c7136

8 0x55be358959cf

9 0x55be358eab98

10 0x55be358ead68

11 0x55be358f9cb3

12 0x7f911c9d9b43

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/bose/bose_driver.py", line 301, in save_screenshot self.get_screenshot_as_file( File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 927, in get_screenshot_as_file png = self.get_screenshot_as_png() File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 963, in get_screenshot_as_png return b64decode(self.get_screenshot_as_base64().encode('ascii')) File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 975, in get_screenshot_as_base64 return self.execute(Command.SCREENSHOT)['value'] File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id Stacktrace:

0 0x55be35900e23

1 0x55be356295f6

2 0x55be3565bf26

3 0x55be35688f16

4 0x55be356850ad

5 0x55be35684815

6 0x55be355f6853

7 0x55be358c2638

8 0x55be358c6507

9 0x55be358d0c4c

10 0x55be358c7136

11 0x55be358959cf

12 0x55be355f4d78

13 0x7f911c96ed90

Task Started [INFO] Downloading Driver for Chrome Version 116 in build/ directory. This is a one-time process. Download in progress... /app/google-maps-scraper/build/116/chromedriver Running in Docker, So adding sandbox arguments Creating Driver with window_size=1920,1080 and user_agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36 Launched Browser Failed to save screenshot Closing Browser Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/bose/base_task.py", line 190, in run_task result = self.run(driver, data) File "/app/google-maps-scraper/src/scrape_google_maps_links_task.py", line 251, in run links = get_links() File "/app/google-maps-scraper/src/scrape_google_maps_links_task.py", line 230, in get_links should_exit, result = scroll_till_end(1) File "/app/google-maps-scraper/src/scrape_google_maps_links_task.py", line 195, in scroll_till_end rst = [driver.current_url] File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 528, in current_url return self.execute(Command.GET_CURRENT_URL)['value'] File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id Stacktrace:

0 0x55be35900e23

1 0x55be356295f6

2 0x55be3565c8dd

3 0x55be3565d81e

4 0x55be358c2638

5 0x55be358c6507

6 0x55be358d0c4c

7 0x55be358c7136

8 0x55be358959cf

9 0x55be358eab98

10 0x55be358ead68

11 0x55be358f9cb3

12 0x7f911c9d9b43

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/app/google-maps-scraper/main.py", line 16, in launch_tasks(*tasks_to_be_run) File "/usr/local/lib/python3.10/dist-packages/bose/launch_tasks.py", line 54, in launch_tasks current_output = task.begin_task(current_data, task_config) File "/usr/local/lib/python3.10/dist-packages/bose/base_task.py", line 219, in begin_task final = run_task(False, 0) File "/usr/local/lib/python3.10/dist-packages/bose/base_task.py", line 214, in run_task close_driver(driver) File "/usr/local/lib/python3.10/dist-packages/bose/base_task.py", line 181, in close_driver driver.close() File "/usr/local/lib/python3.10/dist-packages/bose/bose_driver.py", line 335, in close return super().close() File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 551, in close self.execute(Command.CLOSE) File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response raise exception_class(message, screen, stacktrace) selenium.common.exceptions.InvalidSessionIdException: Message: invalid session id Stacktrace:

0 0x55be35900e23

1 0x55be356295f6

2 0x55be3565bf26

3 0x55be35688f16

4 0x55be356850ad

5 0x55be35684815

6 0x55be355f6853

7 0x55be358c2638

8 0x55be358c6507

9 0x55be358d0c4c

10 0x55be358c7136

11 0x55be358959cf

12 0x55be355f4d78

13 0x7f911c96ed90

if you are running with docker kindly share your working Dockerfile please.

babynew commented 10 months ago

Finally Its working on Linux ..

I got this error message because I was running Selenium in docker and I hadn't mounted enough swap memory, so it would crash after just a few pages.

To fix this, I used the same docker command, but added -v /dev/shm:/dev/shm after docker run.

Final Output

docker run -v /dev/shm:/dev/shm -v /home/workspace/mapdocker:/app/google-maps-scraper/output mapdocker Task Started [INFO] Downloading Driver for Chrome Version 116 in build/ directory. This is a one-time process. Download in progress... /app/google-maps-scraper/build/116/chromedriver Running in Docker, So adding sandbox arguments Creating Driver with window_size=1920,1080 and user_agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.37 Launched Browser Scrolling... Fetched 5 links. Running in Docker, So adding sandbox arguments Creating Driver with window_size=1920,1080 and user_agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.102 Safari/537.36 Launched Browser Done: The GT Road Done: AnnaMaya FoodHall - Andaz Delhi Done: Indian Accent Done: Olive Bar & Kitchen Done: Shang Palace Filtered 5 links from 5. View written JSON file at output/restaurants-in-delhi.json View written CSV file at output/restaurants-in-delhi.csv Closing Browser Closed Browser View Final Screenshot at tasks/4/final.png View written JSON file at output/all.json Task Completed! View written JSON file at output/all.json View written CSV file at output/all.csv

Thanks to all..... @fabioselau077 @babidi34 @Chetan11-dev

Chetan11-dev commented 10 months ago

@babynew Could you share the Docker File to serve as a Reference to help the community?

Chetan11-dev commented 10 months ago

If you are encountering this issue, please note that it is advisable to run the program in Docker rather than on VPS/Cloud instances. You can find the instructions for running it in Docker here.