meilisearch / scrapix

MIT License
21 stars 9 forks source link

Use browserless to run the headless chrome #58

Closed bidoubiwa closed 1 year ago

bidoubiwa commented 1 year ago

Currently, we are running the scraper in a local headless chromium. This is very heavy in resource consumption. To avoid this situation, we are going to use browserless

qdequele commented 1 year ago

I don't think it would be easily possible without doing a PR on Crawlee.

brunoocasali commented 1 year ago

@bidoubiwa I confirmed what @qdequele said. Not possible to use the connect() from pupeteer, since the browser instances are handled by crawlee.

So, what I suggest is:

Instead of going to plan B, I want to know if the hard limit digitalocean imposes on us (1 GB and 15min per function/serverless call) is enough. I suggest running the benchmark ASAP, so we can quickly discard the serverless option from our planning.

If indeed running the crawler takes more than 1GB, we may go for a single server of k8s jobs.

brunoocasali commented 1 year ago

An alternative before jumping to k8s jobs is paying for the most expensive plan in Vercel which gives us 3GB max of RAM in the serverless:

https://vercel.com/docs/infrastructure/runtime-comparison#memory-size-limits