Closed bidoubiwa closed 1 year ago
I don't think it would be easily possible without doing a PR on Crawlee.
@bidoubiwa I confirmed what @qdequele said. Not possible to use the connect()
from pupeteer, since the browser instances are handled by crawlee.
So, what I suggest is:
Instead of going to plan B, I want to know if the hard limit digitalocean imposes on us (1 GB and 15min per function/serverless call) is enough. I suggest running the benchmark ASAP, so we can quickly discard the serverless option from our planning.
If indeed running the crawler takes more than 1GB, we may go for a single server of k8s jobs.
An alternative before jumping to k8s jobs is paying for the most expensive plan in Vercel which gives us 3GB max of RAM in the serverless:
https://vercel.com/docs/infrastructure/runtime-comparison#memory-size-limits
Currently, we are running the scraper in a local headless chromium. This is very heavy in resource consumption. To avoid this situation, we are going to use browserless