Open Sagaryal opened 2 years ago
@Sagaryal Hi man, Today get the same problem as you, you could found any solution?
@Sagaryal Hi man, Today get the same problem as you, you could found any solution?
@blacksam07 Fortunately Yes, with some peeking into the source code and this comment and its reply.
Below is my workable code. Have explained the detail in another comment next to it to keep this solution comment short and precise. Hope it helps you too.
Dockerfile:
FROM python:3.10-alpine
ENV PYTHONUNBUFFERED True
# chromium is not found inside docker, so need to install it.
RUN apk add --update make gcc g++ libc-dev chromium chromium-chromedriver
WORKDIR /app
COPY requirements.txt ./
RUN pip install -r requirements.txt
EXPOSE 8000
COPY . ./
CMD exec uvicorn main:app --host 0.0.0.0 --port 8000
main.py:
import json
from fastapi import FastAPI
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from utils import Item
app = FastAPI()
'''
Even after installing chromedriver and browser, it returns another bizzarre error.
Turns out you need to pass the driver path
chromedriver path is: /usr/bin/chromedriver
chrome browser path: /usr/bin/chromium-browser (this is automagically found by code)
now running this from venv would not work because of next comment
'''
driver = uc.Chrome(headless=True, driver_executable_path='/usr/bin/chromedriver')
@ app.post("/")
def root(item: Item):
# known url using cloudflare's "under attack mode"
driver.get(item.url)
html = driver.find_element(By.TAG_NAME, 'html').text
return json.loads(html)
Detail explaination as reference to above comment
I incorrectly assumed that the package would also download the chromium driver and browser
as the puppeteer would.
In this line, executable
is empty i.e None and hence the below error
scrapper | File "/usr/local/lib/python3.10/posixpath.py", line 152, in dirname
scrapper | p = os.fspath(p)
scrapper | TypeError: expected str, bytes or os.PathLike object, not NoneType
While running locally (venv), its executable
path was /usr/bin/google-chrome
. But for docker, no such chromedriver/browser is installed either by package or by us which is required by selenium webdriver to open the browser and perform our tasks. 😐
Trying to install chromium chromium-driver
in python-3.10-slim
docker always gave be debian connection error. 😶
So ended up using ain alpine image
with some additional packages so that requirements.txt packages install smoothly.
FROM python:3.10-alpine
RUN apk add --update make gcc g++ libc-dev chromium chromium-chromedriver
Hoping that now it would run smoothly 😁, I was shattered by yet another Chinese error 💔
With the help of this comment and Readme I tried setting executable_path
and browser_executable_path
but ran to the same error.
Then I got my hands again dirty by diving into the source code and found that there is no executable_path
argument but there was driver_executable_path
which was neither mentioned anywhere nor was found automagically like the browser executable.😑
Upon trying setting the above argument with chrome driver path: driver_executable_path='/usr/bin/chromedriver'
, Voila it worked. 🤩 🥳
The reason driver_executable_path
was not mentioned or needed is that the program would create one as below
/root/.local/share/undetected_chromedriver/739aa58183d6f966_chromedriver
everytime you start the program. But in the case of docker, it didn't create that chromedriver or somehow download/install it (haven't seen that part of the code). So we needed to manually install chromedriver and provide its path.
So now if you run it locally (venv) it would not because we manually provided the driver path to /usr/bin/chromedriver
where there is no chromedriver. So for now you might need to copy latest any one driver from ~/.local/share/undetected_chromedriver/
to /usr/bin/
path with executable permission in your local machine.
Hope now its clear. Thanks
The reason driver_executable_path was not mentioned or needed is that the program would create one as below /root/.local/share/undetected_chromedriver/739aa58183d6f966_chromedriver everytime you start the program. But in the case of docker, it didn't create that chromedriver or somehow download/install it (haven't seen that part of the code). So we needed to manually install chromedriver and provide its path.
@ultrafunkamsterdam Sir any reason why no chromedriver is download/created /root/.local/share/undetected_chromedriver
?
i suggest using official docker image. hub username ultrafunk . and read the readme of it
@Sagaryal Oh yes, Yesterday I detected that chrome was not installed I try installing it, but I don't having present the problem with the path for chromedriver 😞 , thanks for your explanation. this solution is working for me 👏🏽.
I use my own docker image because I need to setup more thinks to deploy on AWS lambda
@blacksam07 have you fixed the issue, I am working on the same , build a custom docker image to be used by AWS lambda
@ultrafunkamsterdam Would you mind providing a Dockerfile for that image?
@yaguangtang Sorry man I forget to replay your message, and yes I can solve the problem and create my own docker image, this is the docker file
# Define global args
ARG FUNCTION_DIR="/home/app/"
ARG RUNTIME_VERSION="3.9"
ARG DISTRO_VERSION="3.16"
# Stage 1 - bundle base image + runtime
# Grab a fresh copy of the image and install GCC
FROM python:${RUNTIME_VERSION}-alpine${DISTRO_VERSION} AS python-alpine
# Install GCC (Alpine uses musl but we compile and link dependencies with GCC)
RUN apk add --no-cache \
libstdc++
# Stage 2 - build function and dependencies
FROM python-alpine AS build-image
# Install aws-lambda-cpp build dependencies
RUN apk add --no-cache \
build-base \
libtool \
autoconf \
automake \
libexecinfo-dev \
make \
cmake \
libcurl \
curl \
gcc \
g++
# Include global args in this stage of the build
ARG FUNCTION_DIR
ARG RUNTIME_VERSION
# Create function directory
RUN mkdir -p ${FUNCTION_DIR}
# Copy required files
COPY patcher.py ${FUNCTION_DIR}
COPY function_name.py ${FUNCTION_DIR}
COPY requirements.txt .
# Optional – Install the function's dependencies
RUN python${RUNTIME_VERSION} -m pip install --upgrade pip
RUN python${RUNTIME_VERSION} -m pip install -r requirements.txt --target ${FUNCTION_DIR}
# Fix undetected_chromedriver to use in lambda
RUN cd ${FUNCTION_DIR} && cp -f patcher.py ${FUNCTION_DIR}/undetected_chromedriver
# Install Lambda Runtime Interface Client for Python
RUN python${RUNTIME_VERSION} -m pip install awslambdaric --target ${FUNCTION_DIR}
# Stage 3 - final runtime image
# Grab a fresh copy of the Python image
FROM python-alpine
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
RUN apk add --no-cache chromium
RUN wget https://chromedriver.storage.googleapis.com/83.0.4103.39/chromedriver_linux64.zip
RUN cp /usr/bin/chromedriver ${FUNCTION_DIR}
# (Optional) Add Lambda Runtime Interface Emulator and use a script in the ENTRYPOINT for simpler local runs
ADD https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie
COPY entry.sh /
RUN chmod 755 /usr/bin/aws-lambda-rie /entry.sh
ENTRYPOINT [ "/entry.sh" ]
CMD [ "function.handler" ]
#!/bin/sh
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
exec /usr/bin/aws-lambda-rie /usr/local/bin/python -m awslambdaric $1
else
exec /usr/local/bin/python -m awslambdaric $1
fi
the patcher file is the same as the PR #643 is not merge but you need this change to run in aws, remember that you need to create a EC2 intance and deploy in aws using this instance. if you need more help, write me
@blacksam07 I used ur docker file but I get: Error: fork/exec /entry.sh: no such file or directory do you know the reason for this?
@esamhassan1 it's possible that you don't have the entry.sh
file in the folder of Dockerfile, and not copy this into the docker image
@blacksam07 I have it, otherwise it would have raised error when building, but I get this error when trying to run it on AWS Lambda. did you run it yourself on AWS?
I solved it using the direct path in entrypoint ENTRYPOINT ["/usr/local/bin/python", "-m", "awslambdaric"], but I now get this error:
{ "errorMessage": "[Errno 30] Read-only file system: '/home/sbx_user1051'", "errorType": "OSError", "requestId": "ceaeedc7-b520-457e-96fd-e2020b26c5ef", "stackTrace": [ " File \"/home/app/app.py\", line 34, in lambda_handler\n driver = uc.Chrome(headless=True\n", " File \"/home/app/undetected_chromedriver/init.py\", line 235, in init\n patcher = Patcher(\n", " File \"/home/app/undetected_chromedriver/patcher.py\", line 66, in init\n os.makedirs(self.data_path, exist_ok=True)\n", " File \"/usr/local/lib/python3.9/os.py\", line 215, in makedirs\n makedirs(head, exist_ok=exist_ok)\n", " File \"/usr/local/lib/python3.9/os.py\", line 215, in makedirs\n makedirs(head, exist_ok=exist_ok)\n", " File \"/usr/local/lib/python3.9/os.py\", line 215, in makedirs\n makedirs(head, exist_ok=exist_ok)\n", " File \"/usr/local/lib/python3.9/os.py\", line 225, in makedirs\n mkdir(name, mode)\n" ] }
@esamhassan1, Yes with this config I can run on AWS without problem.
this error is because you need to change the patcher.py according to this PR #643
I solved it using the direct path in entrypoint ENTRYPOINT ["/usr/local/bin/python", "-m", "awslambdaric"], but I now get this error:
{ "errorMessage": "[Errno 30] Read-only file system: '/home/sbx_user1051'", "errorType": "OSError", "requestId": "ceaeedc7-b520-457e-96fd-e2020b26c5ef", "stackTrace": [ " File "/home/app/app.py", line 34, in lambda_handler\n driver = uc.Chrome(headless=True\n", " File "/home/app/undetected_chromedriver/init.py", line 235, in init\n patcher = Patcher(\n", " File "/home/app/undetected_chromedriver/patcher.py", line 66, in init\n os.makedirs(self.data_path, exist_ok=True)\n", " File "/usr/local/lib/python3.9/os.py", line 215, in makedirs\n makedirs(head, exist_ok=exist_ok)\n", " File "/usr/local/lib/python3.9/os.py", line 215, in makedirs\n makedirs(head, exist_ok=exist_ok)\n", " File "/usr/local/lib/python3.9/os.py", line 215, in makedirs\n makedirs(head, exist_ok=exist_ok)\n", " File "/usr/local/lib/python3.9/os.py", line 225, in makedirs\n mkdir(name, mode)\n" ] }
@blacksam07 I did that, but still get the error, I suspect I still have some problems with the path. even with the new patcher file it tries to write in a wrong directory
My problem was similar. I was getting some zip-related errors and the answer of Sagaryal enlighten me for the solution! Thank you.
To be more precise for future reader, I was trying to use multiprocessing (workers greater than one) on my FastAPI application (which is quite complex).
But I was getting this traceback:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/webdriver_manager/core/archive.py", line 39, in __extract_zip
archive.extractall(to_directory)
File "/usr/local/lib/python3.9/zipfile.py", line 1642, in extractall
self._extract_member(zipinfo, path, pwd)
File "/usr/local/lib/python3.9/zipfile.py", line 1695, in _extract_member
with self.open(member, pwd=pwd) as source, \
File "/usr/local/lib/python3.9/zipfile.py", line 1529, in open
raise BadZipFile("Truncated file header")
zipfile.BadZipFile: Truncated file header
test_design_4-ds_api-1 |
During handling of the above exception, another exception occurred:
test_design_4-ds_api-1 |
Traceback (most recent call last):
File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.9/site-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
target(sockets=sockets)
File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 60, in run
return asyncio.run(self.serve(sockets=sockets))
File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 67, in serve
config.load()
File "/usr/local/lib/python3.9/site-packages/uvicorn/config.py", line 477, in load
self.loaded_app = import_from_string(self.app)
File "/usr/local/lib/python3.9/site-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/api/./main.py", line 206, in <module>
model_instance = model.Model.from_path(MODEL_DIR)
File "/api/./model.py", line 2286, in from_path
ChromeDriverManager().install()
File "/usr/local/lib/python3.9/site-packages/webdriver_manager/chrome.py", line 39, in install
driver_path = self._get_driver_path(self.driver)
File "/usr/local/lib/python3.9/site-packages/webdriver_manager/core/manager.py", line 31, in _get_driver_path
binary_path = self.driver_cache.save_file_to_cache(driver, file)
File "/usr/local/lib/python3.9/site-packages/webdriver_manager/core/driver_cache.py", line 46, in save_file_to_cache
files = archive.unpack(path)
File "/usr/local/lib/python3.9/site-packages/webdriver_manager/core/archive.py", line 30, in unpack
return self.__extract_zip(directory)
File "/usr/local/lib/python3.9/site-packages/webdriver_manager/core/archive.py", line 41, in __extract_zip
if e.args[0] not in [26, 13] and e.args[1] not in [
IndexError: tuple index out of range
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/webdriver_manager/core/archive.py", line 39, in __extract_zip
archive.extractall(to_directory)
File "/usr/local/lib/python3.9/zipfile.py", line 1642, in extractall
self._extract_member(zipinfo, path, pwd)
File "/usr/local/lib/python3.9/zipfile.py", line 1697, in _extract_member
shutil.copyfileobj(source, target)
File "/usr/local/lib/python3.9/shutil.py", line 205, in copyfileobj
buf = fsrc_read(length)
File "/usr/local/lib/python3.9/zipfile.py", line 924, in read
data = self._read1(n)
File "/usr/local/lib/python3.9/zipfile.py", line 992, in _read1
data += self._read2(n - len(data))
File "/usr/local/lib/python3.9/zipfile.py", line 1027, in _read2
raise EOFError
EOFError
But the problem was quite simple. At some part of my code I was installing the ChromeDriver via the ChromeDriverManager
from webdriver_manager.chrome
module, and it tryied to install it on each worker (spawn process), but to do that it had to download the webdriver (which yields a zip file) and unzip it. However, since they all tried to access the zip at the same time, it conflicted.
The solution, thanks again Sagaryal for the enlightenment, was quite happened to be quite simple, just installing the ChromeDriver on build time (added the following lines to my Dockerfile:
RUN pip install webdriver-manager==3.8.5
RUN python -c "from webdriver_manager.chrome import ChromeDriverManager; from os import environ; print(ChromeDriverManager(version=environ['CHROMEDRIVE_VERSION']).install())"
I set the version of my ChromeDriver for compatibility purposes, but if one is comfortable using the latest, just adding:
RUN pip install webdriver-manager
RUN python -c "from webdriver_manager.chrome import ChromeDriverManager; print(ChromeDriverManager().install())"
Suffices. Hope it helps someone having the same problem I had :)
Continuation of #740
This issue has been created again because no response was received for the comment in the previously closed ticket. I am pasting the code again here and the issue.
@ultrafunkamsterdam Sir please do once check the above Edited code again. It's not because of
plain html to json.
As you can see even plain return response is not working.Furthermore, the
root()
function executes only when API is called. But Chrome initialization is done before and code is not even running.As you can see error logs.
Before Chrome Initiasation ------>
is being printed but notAfter Chrome Initiasation ------>
which means that Chrome is not being initialized.Also the error logs point out error in below line in your code:
Nevertheless, I have tried and edited the code and error response above for your reference.
Also Please do note that THIS IS WORKING in non-docker.
The below code works fine when running locally using virtualenv. But when I dockerized it, an error is received. As I debugged it seems the error is from
driver = uc.Chrome(headless=True)
Python Versions tried: 3.10, 3.8
Error:
Dockerfile