omkarcloud / google-maps-scraper

👋 HOLA 👋 HOLA 👋 HOLA ! ENJOY OUR GOOGLE MAPS SCRAPER 🚀 TO EFFORTLESSLY EXTRACT DATA SUCH AS NAMES, ADDRESSES, PHONE NUMBERS, REVIEWS, WEBSITES, AND RATINGS FROM GOOGLE MAPS WITH EASE! 🤖
https://www.omkar.cloud/
MIT License
863 stars 211 forks source link

Scraper with FastAPI #141

Closed angelkurten closed 1 month ago

angelkurten commented 4 months ago

Description

I tried integrate the scraper with fast-api but I getting this error

Code to Reproduce (Paste main.py)

from datetime import datetime

from fastapi import FastAPI
from fastapi.openapi.models import Response

from src import Gmaps
from src.point import point

app = FastAPI()

@app.get("/{category}/{latitude}/{longitude}")
def scrap(category: str, latitude: float, longitude: float, point_id: str):
    point.set_point_id(point_id)
    Gmaps.places(
        queries=[category],
        fields=Gmaps.ALL_FIELDS,
        max=120,
        geo_coordinates=f"{latitude}, {longitude}",
    )
    return Response(status_code=200, content="Success")

Dockerfile

FROM chetan1111/botasaurus:latest

# Establecer PYTHONUNBUFFERED para no almacenar los outputs en búfer
ENV PYTHONUNBUFFERED=1

# Establecer el directorio de trabajo en el contenedor
WORKDIR /app

# Copiar solo los archivos necesarios para instalar las dependencias primero
COPY requirements.txt ./

# Instalar dependencias del proyecto
RUN python -m pip install --no-cache-dir --upgrade -r requirements.txt

# Copiar el resto del código fuente del proyecto al contenedor
COPY . .

# Comando para ejecutar la aplicación FastAPI con Uvicorn
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--reload"]

Docker compose

services:
  scrapper:
    platform: linux/arm64/v8
    build: .
    shm_size: 4000m
    ports:
      - "8000:8000"
    volumes:
      - .:/app
    environment:
      - UVICORN_HOST=0.0.0.0
      - UVICORN_PORT=8000
      - UVICORN_RELOAD=True

Error

INFO: Will watch for changes in these directories: ['/app'] 2024-03-06T21:29:20.158071138Z INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit) 2024-03-06T21:29:20.158203638Z INFO: Started reloader process [1] using StatReload 2024-03-06T21:29:29.208874753Z INFO: Started server process [11] 2024-03-06T21:29:29.209013420Z INFO: Waiting for application startup. 2024-03-06T21:29:29.211119836Z INFO: Application startup complete. 2024-03-06T21:30:17.209310220Z Running 2024-03-06T21:30:20.983483013Z Chrome failed to launch. Retrying with additional server options. To add server options by default, include '--server' in your launch command. 2024-03-06T21:30:23.443837209Z INFO: 192.168.224.1:54530 - "GET /bar/40.41116/-3.7044?point_id=1 HTTP/1.1" 500 Internal Server Error 2024-03-06T21:30:23.479692709Z ERROR: Exception in ASGI application 2024-03-06T21:30:23.479729375Z Traceback (most recent call last): 2024-03-06T21:30:23.479732750Z File "/usr/local/lib/python3.9/site-packages/botasaurus/create_driver_utils.py", line 236, in create_selenium_driver 2024-03-06T21:30:23.479735000Z driver = AntiDetectDriver( 2024-03-06T21:30:23.479736792Z File "/usr/local/lib/python3.9/site-packages/botasaurus/anti_detect_driver.py", line 33, in init 2024-03-06T21:30:23.479739042Z super().init(args, kwargs) 2024-03-06T21:30:23.479740709Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in init 2024-03-06T21:30:23.479742625Z super().init(DesiredCapabilities.CHROME['browserName'], "goog", 2024-03-06T21:30:23.479744375Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py", line 92, in init 2024-03-06T21:30:23.479746125Z super().init( 2024-03-06T21:30:23.479747750Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 272, in init 2024-03-06T21:30:23.479749500Z self.start_session(capabilities, browser_profile) 2024-03-06T21:30:23.479751209Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 364, in start_session 2024-03-06T21:30:23.479753084Z response = self.execute(Command.NEW_SESSION, parameters) 2024-03-06T21:30:23.479754625Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute 2024-03-06T21:30:23.479756417Z self.error_handler.check_response(response) 2024-03-06T21:30:23.479758084Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response 2024-03-06T21:30:23.479759875Z raise exception_class(message, screen, stacktrace) 2024-03-06T21:30:23.479761542Z selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally. 2024-03-06T21:30:23.479765959Z (session not created: DevToolsActivePort file doesn't exist) 2024-03-06T21:30:23.479801292Z (The process started from chrome location /opt/google/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.) 2024-03-06T21:30:23.479804584Z Stacktrace: 2024-03-06T21:30:23.479806292Z #0 0x0040007a5f83 2024-03-06T21:30:23.479813667Z #1 0x00400045ecf7 2024-03-06T21:30:23.479815459Z #2 0x00400049660e 2024-03-06T21:30:23.479817084Z #3 0x00400049326e 2024-03-06T21:30:23.479819000Z #4 0x0040004e380c 2024-03-06T21:30:23.479820875Z #5 0x0040004d7e53 2024-03-06T21:30:23.479822459Z #6 0x00400049fdd4 2024-03-06T21:30:23.479824209Z #7 0x0040004a11de 2024-03-06T21:30:23.479825792Z #8 0x00400076a531 2024-03-06T21:30:23.479827334Z #9 0x00400076e455 2024-03-06T21:30:23.479828917Z #10 0x004000756f55 2024-03-06T21:30:23.479830500Z #11 0x00400076f0ef 2024-03-06T21:30:23.479833334Z #12 0x00400073a99f 2024-03-06T21:30:23.479834917Z #13 0x004000793008 2024-03-06T21:30:23.479836500Z #14 0x0040007931d7 2024-03-06T21:30:23.479838125Z #15 0x0040007a5124 2024-03-06T21:30:23.479839667Z #16 0x004002ca1044 2024-03-06T21:30:23.479841209Z 2024-03-06T21:30:23.479842792Z 2024-03-06T21:30:23.479844375Z During handling of the above exception, another exception occurred: 2024-03-06T21:30:23.479846084Z 2024-03-06T21:30:23.479847625Z Traceback (most recent call last): 2024-03-06T21:30:23.479849250Z File "/usr/local/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 412, in run_asgi 2024-03-06T21:30:23.479851084Z result = await app( # type: ignore[func-returns-value] 2024-03-06T21:30:23.479852750Z File "/usr/local/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in call 2024-03-06T21:30:23.479854500Z return await self.app(scope, receive, send) 2024-03-06T21:30:23.479856250Z File "/usr/local/lib/python3.9/site-packages/fastapi/applications.py", line 1106, in call 2024-03-06T21:30:23.479858084Z await super().call(scope, receive, send) 2024-03-06T21:30:23.479860000Z File "/usr/local/lib/python3.9/site-packages/starlette/applications.py", line 122, in call 2024-03-06T21:30:23.479870750Z await self.middleware_stack(scope, receive, send) 2024-03-06T21:30:23.479873125Z File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call 2024-03-06T21:30:23.479874959Z raise exc 2024-03-06T21:30:23.479876542Z File "/usr/local/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call 2024-03-06T21:30:23.479888250Z await self.app(scope, receive, _send) 2024-03-06T21:30:23.479891000Z File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in call 2024-03-06T21:30:23.479892750Z raise exc 2024-03-06T21:30:23.479894334Z File "/usr/local/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in call 2024-03-06T21:30:23.479896042Z await self.app(scope, receive, sender) 2024-03-06T21:30:23.479897709Z File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call 2024-03-06T21:30:23.479899500Z raise e 2024-03-06T21:30:23.479901042Z File "/usr/local/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call 2024-03-06T21:30:23.479914084Z await self.app(scope, receive, send) 2024-03-06T21:30:23.479917084Z File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 718, in call 2024-03-06T21:30:23.479922125Z await route.handle(scope, receive, send) 2024-03-06T21:30:23.479924042Z File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle 2024-03-06T21:30:23.479925792Z await self.app(scope, receive, send) 2024-03-06T21:30:23.479927500Z File "/usr/local/lib/python3.9/site-packages/starlette/routing.py", line 66, in app 2024-03-06T21:30:23.479929459Z response = await func(request) 2024-03-06T21:30:23.481045542Z File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 274, in app 2024-03-06T21:30:23.481066209Z raw_response = await run_endpoint_function( 2024-03-06T21:30:23.481068209Z File "/usr/local/lib/python3.9/site-packages/fastapi/routing.py", line 193, in run_endpoint_function 2024-03-06T21:30:23.481070042Z return await run_in_threadpool(dependant.call, values) 2024-03-06T21:30:23.481071750Z File "/usr/local/lib/python3.9/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool 2024-03-06T21:30:23.481073542Z return await anyio.to_thread.run_sync(func, args) 2024-03-06T21:30:23.481075334Z File "/usr/local/lib/python3.9/site-packages/anyio/to_thread.py", line 33, in run_sync 2024-03-06T21:30:23.481077084Z return await get_asynclib().run_sync_in_worker_thread( 2024-03-06T21:30:23.481078750Z File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread 2024-03-06T21:30:23.481080584Z return await future 2024-03-06T21:30:23.481082584Z File "/usr/local/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 807, in run 2024-03-06T21:30:23.481084334Z result = context.run(func, args) 2024-03-06T21:30:23.481085959Z File "/app/server.py", line 16, in scrap 2024-03-06T21:30:23.481087625Z Gmaps.places( 2024-03-06T21:30:23.481095125Z File "/app/src/gmaps.py", line 327, in places 2024-03-06T21:30:23.481096917Z places_obj = scraper.scrape_places(place_data, cache = use_cache) 2024-03-06T21:30:23.481098667Z File "/usr/local/lib/python3.9/site-packages/botasaurus/decorators.py", line 650, in wrapper_browser 2024-03-06T21:30:23.481100459Z current_result = run_task(data_item, False, 0) 2024-03-06T21:30:23.481102084Z File "/usr/local/lib/python3.9/site-packages/botasaurus/decorators.py", line 530, in run_task 2024-03-06T21:30:23.481103834Z driver = create_selenium_driver(options, desired_capabilities) 2024-03-06T21:30:23.481105667Z File "/usr/local/lib/python3.9/site-packages/botasaurus/create_driver_utils.py", line 253, in create_selenium_driver 2024-03-06T21:30:23.481107500Z return create_selenium_driver( options, desired_capabilities, attempt_download=False) 2024-03-06T21:30:23.481109250Z File "/usr/local/lib/python3.9/site-packages/botasaurus/create_driver_utils.py", line 236, in create_selenium_driver 2024-03-06T21:30:23.481111000Z driver = AntiDetectDriver( 2024-03-06T21:30:23.481112625Z File "/usr/local/lib/python3.9/site-packages/botasaurus/anti_detect_driver.py", line 33, in init 2024-03-06T21:30:23.481114625Z super().init(args, **kwargs) 2024-03-06T21:30:23.481116459Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in init 2024-03-06T21:30:23.481118167Z super().init(DesiredCapabilities.CHROME['browserName'], "goog", 2024-03-06T21:30:23.481125209Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/chromium/webdriver.py", line 92, in init 2024-03-06T21:30:23.481145375Z super().init( 2024-03-06T21:30:23.481150917Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 272, in init 2024-03-06T21:30:23.481153334Z self.start_session(capabilities, browser_profile) 2024-03-06T21:30:23.481155000Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 364, in start_session 2024-03-06T21:30:23.481157500Z response = self.execute(Command.NEW_SESSION, parameters) 2024-03-06T21:30:23.481159167Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 429, in execute 2024-03-06T21:30:23.481160917Z self.error_handler.check_response(response) 2024-03-06T21:30:23.481162542Z File "/usr/local/lib/python3.9/site-packages/selenium/webdriver/remote/errorhandler.py", line 243, in check_response 2024-03-06T21:30:23.481168834Z raise exception_class(message, screen, stacktrace) 2024-03-06T21:30:23.481171417Z selenium.common.exceptions.SessionNotCreatedException: Message: session not created: Chrome failed to start: exited normally. 2024-03-06T21:30:23.481173209Z (session not created: DevToolsActivePort file doesn't exist) 2024-03-06T21:30:23.481174875Z (The process started from chrome location /opt/google/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.) 2024-03-06T21:30:23.481182459Z Stacktrace: 2024-03-06T21:30:23.481187709Z #0 0x0040007a5f83 2024-03-06T21:30:23.481190334Z #1 0x00400045ecf7 2024-03-06T21:30:23.481191959Z #2 0x00400049660e 2024-03-06T21:30:23.481193584Z #3 0x00400049326e 2024-03-06T21:30:23.481195334Z #4 0x0040004e380c 2024-03-06T21:30:23.481197000Z #5 0x0040004d7e53 2024-03-06T21:30:23.481198667Z #6 0x00400049fdd4 2024-03-06T21:30:23.481200292Z #7 0x0040004a11de 2024-03-06T21:30:23.481201875Z #8 0x00400076a531 2024-03-06T21:30:23.481203417Z #9 0x00400076e455 2024-03-06T21:30:23.481205084Z #10 0x004000756f55 2024-03-06T21:30:23.481206709Z #11 0x00400076f0ef 2024-03-06T21:30:23.481208292Z #12 0x00400073a99f 2024-03-06T21:30:23.481210375Z #13 0x004000793008 2024-03-06T21:30:23.481228542Z #14 0x0040007931d7 2024-03-06T21:30:23.481231959Z #15 0x0040007a5124 2024-03-06T21:30:23.481233584Z #16 0x004002ca1044 2024-03-06T21:30:23.481238459Z

Zip and Upload the error_log/ Folder (Optional, if there are errors)

Chetan11-dev commented 4 months ago

I will be releasing API for Gmaps, so kindly wait till then.

Chetan11-dev commented 1 month ago

Release new version with API Integration, Kindly run command

python -m pip install bota botasaurus_api botasaurus_driver bota botasaurus-proxy-authentication botasaurus_server --upgrade

and then run the below commands 1️⃣ Clone the Magic 🧙‍♀️:

git clone https://github.com/omkarcloud/google-maps-scraper
cd google-maps-scraper

2️⃣ Install Dependencies 📦:

python -m pip install -r requirements.txt && python run.py install

3️⃣ Launch the UI Dashboard 🚀:

python run.py