Closed Convalytics closed 1 year ago
@Convalytics
Hello. Thank you for sponsoring! This is also a great bug report.
I was expecting randomizing data directory to avoid duplication was enough, but I can introduce my older solution. It was cleaning /tmp
fully.
To share, I created a demo branch. Could you try and tell me how it goes? https://github.com/umihico/docker-selenium-lambda/commit/85dfc26f7b8af2281c8d215996d9a2869c264d05
Possibly your s3 files size + chrome tmp size are more than 512MB? Then chrome may crush. It must counts prior files if not cold start and could be bigger more than expected
Oh man! Thank you for the repository AND this issue!
Possibly your s3 files size + chrome tmp size are more than 512MB? Then chrome may crush. It must counts prior files if not cold start and could be bigger more than expected
It look me a lot of brute force testing to come to this conclusion. The code in this repository works perfectly, but as soon as I swapped out https://example.com/
for almost any other website, everything starts crashing. Sometimes it took until the second run, but it'd eventually crash.
Increasing the size to be larger than 512mb fixed the problem, and I added your flush_tmp()
function in for good measure since I'll be calling this many times.
I could be worth mentioning that Chrome will randomly crash in the main README.md file until you use more than 512mb storage.
@humphrey
but as soon as I swapped out https://example.com/ for almost any other website, everything starts crashing.
Thank you for sharing! I didn't know that. Chrome is rapidly changing its major version, so latest ones may consume storage more than before.
I'll think about increasing memory limit by Serverless Framework config.
@umihico
Thank you for sharing! I didn't know that. Chrome is rapidly changing its major version, so latest ones may consume storage more than before.
Thanks! Also, I suspect that websites are running more and more bloated JavaScripts, which would use more memory. The sites I needed to test take a while to load all of those things into memory to display the site.
I reckon something in the README.md that sayings something like.
If you experience any errors such as Chrome crashing or not being available you might need to increase the storage available to your Lambda Function.
I'll think about increasing memory limit by Serverless Framework config.
Because even if the value in Serverless config was good enough, I added a couple of my own things (such wrapping Selenium in a GraphQL API) which takes up more storage.
@humphrey I'm so sorry to keep this issue so long.
I added the note as you suggested and hope it helps others. Thank you again.
π
@umihico Is it safe to flush the temporary storage? Since AWS Lambda reuses the storage, and multiple invocations are running in parallel, wouldn't this cause unexpected issues?
For anyone facing this issue, I also had to remove the following option
options.add_argument("--remote-debugging-port=9222")
Removing it gave me far fewer Chrome unreachable errors
@samkit-jain I don't think temporary storage is not shared by multiple invocations if they are running at the same time. After the invocation process completely end, new invocation may use the used storage. This is my idea about AWS Lambda.
Also I don't think current code is flushing storage. It is just specifying location randomly, therefore all lambda processes can have own clean space even if they get old used storages
@umihico Thanks ππ»
This code works great, but I'm consistently getting "chrome not reachable" errors if I run another instance of my lambda within a few minutes or a prior run. I've tried manually creating "tmp" folders for the 3 chrome data folers as well as using mkdtemp().
For background: My processes run for about 1-2 minutes and access multiple sites. Some processes require that I download and upload files. I've been having this issue for several months from chrome 99 through 103. My current "solution" is to retry after a long wait period, hoping that the lambda instance is wiped out and the next process starts on a fresh/cold instance.
I'm hoping someone can take a look at my settings and guide me in the right direction.