ultrafunkamsterdam / undetected-chromedriver

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
https://github.com/UltrafunkAmsterdam/undetected-chromedriver
GNU General Public License v3.0
9.08k stars 1.09k forks source link

Can not use undetected-chromedriver with aws lambda. OSError: [Errno 38] Function not implemented #1382

Open dajiangqingzhou opened 1 year ago

dajiangqingzhou commented 1 year ago

Hello guys, I'm trying to use undetected_chromedriver on aws lambda, but some error happened. the error: [ERROR] OSError: [Errno 38] Function not implemented Traceback (most recent call last): File "/var/task/lambda_function.py", line 42, in lambda_handler driver = create_undetected_driver() File "/var/task/lambda_function.py", line 23, in create_undetected_driver import undetected_chromedriver as uc File "/opt/python/lib/python3.7/site-packages/undetected_chromedriver/init.py", line 43, in from .patcher import IS_POSIX File "/opt/python/lib/python3.7/site-packages/undetected_chromedriver/patcher.py", line 25, in class Patcher(object): File "/opt/python/lib/python3.7/site-packages/undetected_chromedriver/patcher.py", line 26, in Patcher lock = Lock() File "/var/lang/lib/python3.7/multiprocessing/context.py", line 67, in Lock return Lock(ctx=self.get_context()) File "/var/lang/lib/python3.7/multiprocessing/synchronize.py", line 162, in init SemLock.init(self, SEMAPHORE, 1, 1, ctx=ctx) File "/var/lang/lib/python3.7/multiprocessing/synchronize.py", line 59, in init unlink_now)

I use python3.7 selenium==4.9.0 undetect-chromedriver=3.5.0 chrome-driver==2.31 headlesschrome==v1.0.0-41

there is my code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_dir = '/opt/chrome'
chrome_path = chrome_dir + '/headless-chromium'
chrome_driver_path = chrome_dir + '/chromedriver'

def create_driver():
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.binary_location = chrome_path
    return webdriver.Chrome(executable_path=chrome_driver_path, options=chrome_options)

def create_undetected_driver():
    import undetected_chromedriver as uc
    from undetected_chromedriver.options import ChromeOptions
    chrome_options = ChromeOptions()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--single-process')
    chrome_options.add_argument('--disable-dev-shm-usage')
    chrome_options.add_argument("--disable-gpu")
    chrome_options.binary_location = chrome_path
    return uc.Chrome(driver_executable_path=chrome_driver_path,
                       headless=True,
                       options=chrome_options,
                       use_subprocess=False,
                       user_data_dir="/tmp")

def lambda_handler(event, context):
    """The actual function that will be called"""
    driver = create_undetected_driver()
    try:
        url = 'https://nowsecure.nl'
        driver.get(url)
        image = driver.get_screenshot_as_base64()
        return {
            'headers': { "Content-Type": "image/png" },
            'statusCode': 200,
            'body': image,
            'isBase64Encoded': True
        }
    finally:
        driver.quit()

    return {
        'statusCode': 200,
        'body': json.dumps(str(event))
    }

It work very well when use create_driver method to get an url.

chuanyao17 commented 1 year ago

Hi, I also had the same/similar problem while testing the aws lambda. ( The code is workable on the local testing) the error: { "errorMessage": "[Errno 38] Function not implemented", "errorType": "OSError", "requestId": "", "stackTrace": [ " File \"/usr/local/lib/python3.7/importlib/init.py\", line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n", " File \"\", line 1006, in _gcd_import\n", " File \"\", line 983, in _find_and_load\n", " File \"\", line 967, in _find_and_load_unlocked\n", " File \"\", line 677, in _load_unlocked\n", " File \"\", line 728, in exec_module\n", " File \"\", line 219, in _call_with_frames_removed\n", " File \"/function/lambda_function.py\", line 4, in \n import undetected_chromedriver as uc\n", " File \"/function/undetected_chromedriver/init.py\", line 43, in \n from .patcher import IS_POSIX\n", " File \"/function/undetected_chromedriver/patcher.py\", line 25, in \n class Patcher(object):\n", " File \"/function/undetected_chromedriver/patcher.py\", line 26, in Patcher\n lock = Lock()\n", " File \"/usr/local/lib/python3.7/multiprocessing/context.py\", line 67, in Lock\n return Lock(ctx=self.get_context())\n", " File \"/usr/local/lib/python3.7/multiprocessing/synchronize.py\", line 162, in init\n SemLock.init(self, SEMAPHORE, 1, 1, ctx=ctx)\n", " File \"/usr/local/lib/python3.7/multiprocessing/synchronize.py\", line 59, in init\n unlink_now)\n" ] }

my dockerfile:

# Define custom function directory
ARG FUNCTION_DIR="/function"

FROM python:3.7.8 as build-image

# Include global arg in this stage of the build
ARG FUNCTION_DIR

# Copy function code
RUN mkdir -p ${FUNCTION_DIR}
COPY . ${FUNCTION_DIR}

# Install the function's dependencies
RUN pip install --target ${FUNCTION_DIR} selenium==4.9.1 awslambdaric undetected-chromedriver

# Use a slim version of the base Python image to reduce the final image size
FROM python:3.7.8

# Include global arg in this stage of the build
ARG FUNCTION_DIR

# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}

# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}

# Install Chrome 
RUN apt-get update && apt-get install -y wget gnupg2 && \
    wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - && \
    echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list && \
    apt-get update && apt-get install -y google-chrome-stable && \
    rm -rf /var/lib/apt/lists/*

# Install Chrome driver
RUN wget -q -O /tmp/chromedriver.zip https://chromedriver.storage.googleapis.com/$(curl -sS chromedriver.storage.googleapis.com/LATEST_RELEASE)/chromedriver_linux64.zip && \
    unzip /tmp/chromedriver.zip -d /usr/local/bin/ && \
    rm /tmp/chromedriver.zip

# Set runtime interface client as default command for the container runtime
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]

# Pass the name of the function handler as an argument to the runtime
CMD [ "lambda_function.handler" ]

my code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import selenium
import undetected_chromedriver as uc

def handler(event, context):

    print("selenium",selenium.__version__)
    print("test!!!!")

I've not done any scrapping yet and still had error importing the undetected_chromedriver.

arik103 commented 11 months ago

I have the same issue. Any solutions to this? Thanks!

panterozo commented 11 months ago

Same issue here

grantfuhr commented 11 months ago

Ran into the same issue. I believe this is because Lambdas can't use all features of the Python multiprocessing standard library. According to this article

Due to the Lambda execution environment not having /dev/shm (shared memory for processes) support, you can’t use multiprocessing.Queue or multiprocessing.Pool

You're probably better off running this in your own container with Fargate.

cimentadaj commented 4 days ago

Same issue here