Open vikaskookna opened 1 week ago
I had the same issue and bypassed it by set the DP_PATH to '/tmp/' (the only write-able dir i AWS Lambda) befor importing the crawl4ai-package. My solution:
import os
from pathlib import Path
os.makedirs('/tmp/.crawl4ai', exist_ok=True)
DB_PATH = '/tmp/.crawl4ai/crawl4ai.db'
Path.home = lambda: Path("/tmp")
from crawl4ai import AsyncWebCrawler
Hope this works for you as well.
ok I will try this @akamf Did you create lambda layer or a docker image, when i tried with layer it exceeded 250 MB limit, how did you mange this?
After doing what you mentioned I got this error Error processing https://chatclient.ai: BrowserType.launch: Executable doesn't exist at /home/sbx_user1051/.cache/ms-playwright/chromium-1134/chrome-linux/chrome
I created a Docker image where I installed Playwright and its dependencies and then chromium with playwright. The Docker image is really big though (because of Playwright I guess), so I'm currently working on optimizing it.
But our latest Dockerfile looks like this:
FROM amazonlinux:2 AS build
RUN curl -sL https://rpm.nodesource.com/setup_16.x | bash - && \
yum install -y nodejs gcc-c++ make python3-devel \
libX11 libXcomposite libXcursor libXdamage libXext libXi libXtst cups-libs \
libXScrnSaver pango at-spi2-atk gtk3 iputils libdrm nss alsa-lib \
libgbm fontconfig freetype freetype-devel ipa-gothic-fonts
RUN npm install -g playwright && \
PLAYWRIGHT_BROWSERS_PATH=/ms-playwright-browsers playwright install chromium
FROM public.ecr.aws/lambda/python:3.11
WORKDIR ${LAMBDA_TASK_ROOT}
COPY requirements.txt .
RUN pip3 install --upgrade pip && \
pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}" --verbose
COPY --from=build /usr/lib /usr/lib
COPY --from=build /usr/local/lib /usr/local/lib
COPY --from=build /usr/bin /usr/bin
COPY --from=build /usr/local/bin /usr/local/bin
COPY --from=build /ms-playwright-browsers /ms-playwright-browsers
ENV PLAYWRIGHT_BROWSERS_PATH=/ms-playwright-browsers
COPY handler.py .
CMD [ "handler.main" ]
I don't know if this is the best solution, but it works for us. Like I said, I'm working on some optimisation for it.
thanks @akamf I tried this but gave me these errors, i'm using m1 mac and built the image using this command
docker build --platform linux/amd64 -t fetchlinks .
var/task/playwright/driver/node: /lib64/libm.so.6: version GLIBC_2.27' not found (required by /var/task/playwright/driver/node) /var/task/playwright/driver/node: /lib64/libc.so.6: version GLIBC_2.28' not found (required by /var/task/playwright/driver/node)
Hi @vikaskookna @akamf
By the next week, I will create the Docker file and also upload the Docker image to a Docker hub. I hope this can also help you.
I created aws lambda docker image, and it fails on this line from crawl4ai import AsyncWebCrawler