shelfio / libreoffice-lambda-layer

MIT License
112 stars 22 forks source link

First convert very slow #30

Open systemx-xx opened 4 years ago

systemx-xx commented 4 years ago

I am using this layer in conjunction with shelfio/aws-lambda-libreoffice to do some DOCX to PDF conversion and although a cold start introduces a small delay, the first conversion after a cold start just from the time /tmp/instdir/program/soffice.bin is called till the file is ready is approximately 12 seconds. The second conversion drops right down to 1-2 sec.

cgratie commented 3 years ago

I'm not using a layer, but have created an image instead and I have the same problem: the first conversion takes > 2 minutes, subsequent conversions are < 20 seconds. So the problem must be related to the way LibreOffice starts the first time.

vladholubiev commented 3 years ago

The problem is Lambda cold start: https://lumigo.io/blog/this-is-all-you-need-to-know-about-lambda-cold-starts/#:~:text=What%20are%20Lambda%20cold%20starts,a%20new%20Lambda%20worker%20handles.

The first time it runs - it loads lots of stuff into the memory. Check out the article on the ways how to deal with it

cgratie commented 3 years ago

Thanks for the link, I had seem something similar recently, but this has more information. Indeed, cold start is part of it, and it is visible when testing the Lambda function as the init time, which in my case is around 5 seconds for a cold start and <1 second afterwards. But the really slow part for me is the conversion using LibreOffice, which I am timing inside the Python handler, and which runs after I am downloading the input file from S3 (which is fast and does not vary a lot with cold starts).

I am already using -env:UselInstallation=file:///tmp plus setting $HOME to /tmp and running a fake conversion during the build process, in order to create the required cache and config folders, but it does not seem to make a difference.

cgratie commented 2 years ago

I have recently updated dependencies and re-created the Docker image for use with AWS Lambda and the problem is still there. I was not at all able to reproduce the same behavior locally by running the same image and converting the same file. So the problem must be related to how AWS makes resources available for a cold start.

The partial solution we settled for was to expand the lambda function to accept a keep-alive input which we trigger every 30 minutes via EventBridge. On our specific case each keep-alive invocation runs in <1s so this adds <48s of compute time per day. The difference in conversion time between hot and cold starts was larger than that, so it's worth the workaround.