umihico / docker-selenium-lambda

The simplest demo of chrome automation by python and selenium in AWS Lambda
MIT License
535 stars 127 forks source link

Help on "Message: chrome not reachable" #164

Closed ahirnish closed 1 year ago

ahirnish commented 1 year ago

Hi @umihico,

I am using your selenium-docker solution to run headless Chrome browser in AWS Lambda for my project. It is a great help already :)

I need a bit of support and your expert advice on this setup if you dont mind. I'll give a bit of context what I am doing so that its easy for you to give advice. -

I created this headless browser lambda and I am using it to render some images from a URL and then save these images as screenshots in S3. This whole process runs for a big number of URLs (like a stream of URLs to this lambda as input) via AWS Step Functions. Although its working but there are always about 50-60 URLs (total URLs 3000) which runs into these following messages -

I am trying to understand the problems as to why it happens only for few URLs and not for all. Your help will help me understand more.

Hope to hear back from you. Thank you.

This is my Dockerfile -

#based on https://github.com/umihico/docker-selenium-lambda
FROM public.ecr.aws/lambda/python:3.8 as build

# download chrome binaries, unzip them desired location
RUN yum install -y unzip && \
    curl -Lo "/tmp/chromedriver.zip" "https://chromedriver.storage.googleapis.com/98.0.4758.48/chromedriver_linux64.zip" && \
    curl -Lo "/tmp/chrome-linux.zip" "https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F950363%2Fchrome-linux.zip?alt=media" && \
    unzip /tmp/chromedriver.zip -d /opt/ && \
    unzip /tmp/chrome-linux.zip -d /opt/

# as of feb 2023 python3.8 was stable
FROM public.ecr.aws/lambda/python:3.8

# install necessary packages for running chrome
RUN yum install atk cups-libs gtk3 libXcomposite alsa-lib \
    libXcursor libXdamage libXext libXi libXrandr libXScrnSaver \
    libXtst pango at-spi2-atk libXt xorg-x11-server-Xvfb \
    xorg-x11-xauth dbus-glib dbus-glib-devel -y

# install selenium
RUN pip install selenium

COPY --from=build /opt/chrome-linux /opt/chrome
COPY --from=build /opt/chromedriver /opt/

#COPY lambda_function.py requirements.txt ./
# copy requirements
COPY . requirements.txt ./

COPY mraid.js ./tmp/

# install requirements
RUN python3.8 -m pip install -r requirements.txt -t .

# Command can be overwritten by providing a different command in the template directly.
# entry point
CMD ["lambda_function.lambda_handler"]

This is more like help and not an issue. Please suggest a better way to ask this help if this is no the right place. THank you.

x-N0 commented 1 year ago

Why not retry these that are failing?, Are always the same bunch that are failing?

@ahirnish

ahirnish commented 1 year ago

input is different each time - its not fixed. No of affected inputs are in the range 50-100 everytime. Message #2 can be more application specific but #1 and #3 are more tool specific. @x-N0

x-N0 commented 1 year ago

What I'd do is to flip the versions to see if that gets the issue resolved.

Turkmen1Mehmet commented 1 year ago

mehmetturkmen@192 docker-selenium-lambda-memo % docker run -p 8080:8080 --platform=linux/amd64 my-selenium-lambda 26 Jun 2023 07:26:04,904 [INFO] (rapid) exec '/var/runtime/bootstrap' (cwd=/var/task, handler=) 26 Jun 2023 07:26:13,082 [INFO] (rapid) extensionsDisabledByLayer(/opt/disable-extensions-jwigqn8j) -> stat /opt/disable-extensions-jwigqn8j: no such file or directory 26 Jun 2023 07:26:13,083 [INFO] (rapid) Configuring and starting Operator Domain 26 Jun 2023 07:26:13,084 [INFO] (rapid) Starting runtime domain 26 Jun 2023 07:26:13,088 [WARNING] (rapid) Cannot list external agents error=open /opt/extensions: no such file or directory START RequestId: 0c4ae207-b6d5-425e-8a02-ca28fb990da5 Version: $LATEST 26 Jun 2023 07:26:13,094 [INFO] (rapid) Starting runtime without AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN , Expected?: false Chromedriver sürümü: None Chromedriver yolu: /opt/chromedriver

[ERROR] WebDriverException: Message: unknown error: Chrome failed to start: crashed.

(chrome not reachable)

(The process started from chrome location /opt/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.) Stacktrace:

0 0x0040006d74e3

1 0x004000406c76

2 0x00400042fd78

3 0x00400042c029

4 0x00400046accc

5 0x00400046a47f

6 0x004000461de3

7 0x0040004372dd

8 0x00400043834e

9 0x0040006973e4

10 0x00400069b3d7

11 0x0040006a5b20

12 0x00400069c023

13 0x00400066a1aa

14 0x0040006c06b8

15 0x0040006c0847

16 0x0040006d0243

17 0x004002aaf44b start_thread

  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriv  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/chromium/webdr  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/remote/webdriv  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/remote/webdriv  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/remote/webdriv  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/remote/errorha    raise exception_class(message, screen, stacktrace) END RequestId: 1d423694-c9a9-41c7-a660-4badff838b6a REPORT RequestId: 1d423694-c9a9-41c7-a660-4badff838b6a Init Duration: 2.28 ms Duration: 16456.85 ms Billed Duration: 16457 ms Memory Size: 3008 MB Max Memory Used: 3008 MB

I'm getting this error right now, do you have any suggestions for a solution?
@ahirnish @umihico

umihico commented 1 year ago

@ahirnish @Turkmen1Mehmet

The shown code above and my code is too different.

If my original code works fine locally, you can modify step by step to convert my file to your file. Then, you can tell us which added line start causing bugs, instead of just pasting entire file.

Turkmen1Mehmet commented 1 year ago

@ahirnish @umihico First of all, I made a small change in the code. First I run the code piece by piece. WebDriver.init()

mehmetturkmen@192 docker-selenium-lambda-memo % docker run -p 8080:8080 --platform=linux/amd64 my-selenium-lambda 26 Jun 2023 10:47:27,361 [INFO] (rapid) exec '/var/runtime/bootstrap' (cwd=/var/task, handler=) 26 Jun 2023 10:47:38,253 [INFO] (rapid) extensionsDisabledByLayer(/opt/disable-extensions-jwigqn8j) -> stat /opt/disable-extensions-jwigqn8j: no such file or directory 26 Jun 2023 10:47:38,255 [INFO] (rapid) Configuring and starting Operator Domain 26 Jun 2023 10:47:38,257 [INFO] (rapid) Starting runtime domain START RequestId: c3d5437a-a8af-484c-9473-16eda043a525 Version: $LATEST 26 Jun 2023 10:47:38,261 [WARNING] (rapid) Cannot list external agents error=open /opt/extensions: no such file or directory 26 Jun 2023 10:47:38,263 [INFO] (rapid) Starting runtime without AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN , Expected?: false [ERROR] TypeError: WebDriver.init() got multiple values for argument 'option    chrome = webdriver.Chrome("/opt/chromedriver",options=options) END RequestId: 495b85d4-3693-4dd9-826c-e4207ceb8b64 REPORT RequestId: 495b85d4-3693-4dd9-826c-e4207ceb8b64 Init Duration: 1.77 ms Duration: 1566.42 ms Billed Duration: 1567 ms Memory Size: 3008 MB Max Memory Used: 3008 MB
^C26 Jun 2023 10:56:08,387 [INFO] (rapid) Received signal signal=interrupt 26 Jun 2023 10:56:08,387 [INFO] (rapid) Shutting down... 26 Jun 2023 10:56:08,390 [WARNING] (rapid) Reset initiated: SandboxTerminated 26 Jun 2023 10:56:08,391 [INFO] (rapid) Sending SIGKILL to runtime-1(17). 26 Jun 2023 10:56:08,413 [INFO] (rapid) Stopping runtime domain 26 Jun 2023 10:56:08,415 [INFO] (rapid) Waiting for runtime domain processes termination 26 Jun 2023 10:56:08,416 [INFO] (rapid) Stopping operator domain 26 Jun 2023 10:56:08,416 [INFO] (rapid) Starting runtime domain

I also updated the code This is the error I got this time

mehmetturkmen@192 docker-selenium-lambda-memo % docker run -p 8080:8080 --platform=linux/amd64 my-selenium-lambda 26 Jun 2023 11:07:18,953 [INFO] (rapid) exec '/var/runtime/bootstrap' (cwd=/var/task, handler=) 26 Jun 2023 11:07:24,248 [INFO] (rapid) extensionsDisabledByLayer(/opt/disable-extensions-jwigqn8j) -> stat /opt/disable-extensions-jwigqn8j: no such file or directory 26 Jun 2023 11:07:24,251 [INFO] (rapid) Configuring and starting Operator Domain 26 Jun 2023 11:07:24,251 [INFO] (rapid) Starting runtime domain 26 Jun 2023 11:07:24,258 [WARNING] (rapid) Cannot list external agents error=open /opt/extensions: no such file or directory START RequestId: 6637e50c-74dc-4eb4-9484-986025c641a9 Version: $LATEST 26 Jun 2023 11:07:24,266 [INFO] (rapid) Starting runtime without AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN , Expected?: false [ERROR] WebDriverException: Message: unknown error: Chrome failed to start: crashed. (chrome not reachable) (The process started from chrome location /opt/chrome/chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.) Stacktrace:

0 0x0040006d74e3

1 0x004000406c76

2 0x00400042fd78

3 0x00400042c029

4 0x00400046accc

5 0x00400046a47f

6 0x004000461de3

7 0x0040004372dd

8 0x00400043834e

9 0x0040006973e4

10 0x00400069b3d7

11 0x0040006a5b20

12 0x00400069c023

13 0x00400066a1aa

14 0x0040006c06b8

15 0x0040006c0847

16 0x0040006d0243

17 0x004002aaf44b start_thread

  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/chrome/webdriv  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/chromium/webdr  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/remote/webdriv  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/remote/webdriv  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/remote/webdriv  File "/var/lang/lib/python3.10/site-packages/selenium/webdriver/remote/errorha    raise exception_class(message, screen, stacktrace) END RequestId: 7ab9eb86-0a3f-42a9-8bfc-c4cd9fc5acc7 REPORT RequestId: 7ab9eb86-0a3f-42a9-8bfc-c4cd9fc5acc7 Init Duration: 4.71 ms Duration: 17264.78 ms Billed Duration: 17265 ms Memory Size: 3008 MB Max Memory Used: 3008 MB

Current main.py code

from selenium import webdriver
from tempfile import mkdtemp
from selenium.webdriver.common.by import By
import os
import sys
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.color import Color
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.chrome.options import Options
import re
import requests
import time
from bs4 import BeautifulSoup
import sqlite3
import urllib.request
import zipfile

def handler(event=None, context=None):
    chromedriver_path = "/opt/chromedriver"
    agent = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'}

    op = webdriver.ChromeOptions()
    op.binary_location = '/opt/chrome/chrome'
    op.add_argument('--headless')
    op.add_argument('--no-sandbox')
    op.add_argument("--disable-gpu")
    op.add_argument("--window-size=1280x1696")
    op.add_argument("--single-process")
    op.add_argument("--disable-dev-shm-usage")
    op.add_argument("--disable-dev-tools")
    op.add_argument("--no-zygote")
    op.add_argument(f"--user-data-dir={mkdtemp()}")
    op.add_argument(f"--data-path={mkdtemp()}")
    op.add_argument(f"--disk-cache-dir={mkdtemp()}")
    op.add_argument("--remote-debugging-port=9222")
    op.add_argument(f"--executable-path={chromedriver_path}")

    browser = webdriver.Chrome(options=op)
    browser.get("https://www.zara.com/tr/")
    return browser.find_element(by=By.XPATH, value="//html").text
umihico commented 1 year ago

@Turkmen1Mehmet In comparison my original code, the diff is more than 10 lines. Please try between them, one line by line. You'll notice inserting specific line cause the error, then please share the detail in another issue.