vladholubiev / serverless-libreoffice

Run LibreOffice in AWS Lambda to create PDFs & convert documents
https://vladholubiev.com/serverless-libreoffice
517 stars 75 forks source link

trimmed .zip? #4

Open fengsi opened 6 years ago

fengsi commented 6 years ago

The file downloaded from releases tab contains everything. Do you have a trimmed .zip that can be uploaded direct to Lambda?

Thanks!

vladholubiev commented 6 years ago

@fengsi hey,

Lambda archive is created here

https://github.com/vladgolubev/serverless-libreoffice/blob/7c739d0a2065715285e73b0a93358bdca36a1eef/infra/lambda.tf#L23-L27

as you see, it just takes whatever is inside src folder and uploads it to Lambda.

The only thing you need to put lo.tar.gz into src folder before

image

Is it what you are looking for?

fengsi commented 6 years ago

Hmm I don't know... Isn't lo.tar.gz too large (> 100MB) for Lambda deployment package? The size limit is only 50MB. I cannot upload it.

vladholubiev commented 6 years ago

@fengsi

Fortunately the 50 MB limit is kind of a lie.

see this article for details, the real limit is 250MB

https://hackernoon.com/exploring-the-aws-lambda-deployment-limits-9a8384b0bec3

fengsi commented 6 years ago

Thanks for the info! OK so the lo.tar.gz has to be deployed as a tarball, and then get unzipped inside Lambda (like, handler needs to unzip it first before using)? The unzipped size of that file is 350.8 MB already.

vladholubiev commented 6 years ago

Right, it just runs cd /tmp && tar -xf /var/task/lo.tar.gz on runtime. /tmp folder has 512 MB of free space, so there are 160 more MB left after unpacking

fengsi commented 6 years ago

OK I see, that explains. Thanks!

Is there anyway that we can trim it down further? I tried ldd soffice.bin and got a list of libs it needs, but haven't checked if I can actually remove anything. I'd assume besides those direct deps, more libs/res files will be needed and it's not straightforward to make the tarball smaller.

fengsi commented 6 years ago

FYI, ldd soffice.bin output:

    linux-vdso.so.1 =>  (0x00007fff196eb000)
    libmergedlo.so => /home/vagrant/instdir/program/./libmergedlo.so (0x00007fb3a5ee3000)
    libuno_sal.so.3 => /home/vagrant/instdir/program/./libuno_sal.so.3 (0x00007fb3a5c85000)
    libc.so.6 => /lib64/libc.so.6 (0x00007fb3a58c1000)
    libgpgmepp.so.6 => /home/vagrant/instdir/program/./libgpgmepp.so.6 (0x00007fb3a5663000)
    libicuuc.so.60 => /home/vagrant/instdir/program/./libicuuc.so.60 (0x00007fb3a52ab000)
    libz.so.1 => /lib64/libz.so.1 (0x00007fb3a5095000)
    libssl3.so => /usr/lib64/libssl3.so (0x00007fb3a4e4a000)
    libsmime3.so => /usr/lib64/libsmime3.so (0x00007fb3a4c23000)
    libnss3.so => /usr/lib64/libnss3.so (0x00007fb3a4901000)
    libnssutil3.so => /usr/lib64/libnssutil3.so (0x00007fb3a46d4000)
    libplds4.so => /lib64/libplds4.so (0x00007fb3a44d0000)
    libplc4.so => /lib64/libplc4.so (0x00007fb3a42cb000)
    libnspr4.so => /lib64/libnspr4.so (0x00007fb3a408e000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb3a3e72000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007fb3a3c6e000)
    libicui18n.so.60 => /home/vagrant/instdir/program/./libicui18n.so.60 (0x00007fb3a37cb000)
    libcurl.so.4 => /usr/lib64/libcurl.so.4 (0x00007fb3a3554000)
    libX11.so.6 => /usr/lib64/libX11.so.6 (0x00007fb3a3219000)
    libxml2.so.2 => /usr/lib64/libxml2.so.2 (0x00007fb3a2eb1000)
    libexpat.so.1 => /lib64/libexpat.so.1 (0x00007fb3a2c88000)
    libxslt.so.1 => /usr/lib64/libxslt.so.1 (0x00007fb3a2a4a000)
    libm.so.6 => /lib64/libm.so.6 (0x00007fb3a2748000)
    librt.so.1 => /lib64/librt.so.1 (0x00007fb3a2540000)
    liborcus-0.13.so.0 => /home/vagrant/instdir/program/./liborcus-0.13.so.0 (0x00007fb3a2218000)
    liborcus-parser-0.13.so.0 => /home/vagrant/instdir/program/./liborcus-parser-0.13.so.0 (0x00007fb3a1fde000)
    liblcms2.so.2 => /home/vagrant/instdir/program/./liblcms2.so.2 (0x00007fb3a1d84000)
    libcairo.so.2 => /home/vagrant/instdir/program/./libcairo.so.2 (0x00007fb3a1a01000)
    libpixman-1.so.0 => /home/vagrant/instdir/program/./libpixman-1.so.0 (0x00007fb3a175b000)
    libfontconfig.so.1 => /usr/lib64/libfontconfig.so.1 (0x00007fb3a1526000)
    libfreetype.so.6 => /usr/lib64/libfreetype.so.6 (0x00007fb3a128a000)
    libXext.so.6 => /usr/lib64/libXext.so.6 (0x00007fb3a1078000)
    libSM.so.6 => /usr/lib64/libSM.so.6 (0x00007fb3a0e71000)
    libICE.so.6 => /usr/lib64/libICE.so.6 (0x00007fb3a0c55000)
    libuno_cppu.so.3 => /home/vagrant/instdir/program/./libuno_cppu.so.3 (0x00007fb3a0a18000)
    libuno_cppuhelpergcc3.so.3 => /home/vagrant/instdir/program/./libuno_cppuhelpergcc3.so.3 (0x00007fb3a0719000)
    libi18nlangtag.so => /home/vagrant/instdir/program/./libi18nlangtag.so (0x00007fb3a04fb000)
    libuno_salhelpergcc3.so.3 => /home/vagrant/instdir/program/./libuno_salhelpergcc3.so.3 (0x00007fb3a02f4000)
    libxmlreaderlo.so => /home/vagrant/instdir/program/./libxmlreaderlo.so (0x00007fb3a00ea000)
    libepoxy.so => /home/vagrant/instdir/program/./libepoxy.so (0x00007fb39fdff000)
    libclewlo.so => /home/vagrant/instdir/program/./libclewlo.so (0x00007fb39fbfa000)
    libclucene.so => /home/vagrant/instdir/program/./libclucene.so (0x00007fb39f816000)
    libpdfiumlo.so => /home/vagrant/instdir/program/./libpdfiumlo.so (0x00007fb39f13d000)
    libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007fb39ee38000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb39ec22000)
    /lib64/ld-linux-x86-64.so.2 (0x00007fb3aa447000)
    libgpgme.so.11 => /home/vagrant/instdir/program/./libgpgme.so.11 (0x00007fb39e9de000)
    libassuan.so.0 => /home/vagrant/instdir/program/./libassuan.so.0 (0x00007fb39e7cc000)
    libicudata.so.60 => /home/vagrant/instdir/program/./libicudata.so.60 (0x00007fb39cc50000)
    libidn2.so.0 => /usr/lib64/libidn2.so.0 (0x00007fb39ca2f000)
    libssh2.so.1 => /usr/lib64/libssh2.so.1 (0x00007fb39c807000)
    libpsl.so.0 => /usr/lib64/libpsl.so.0 (0x00007fb39c592000)
    libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x00007fb39c344000)
    libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x00007fb39c05d000)
    libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x00007fb39be2b000)
    libcom_err.so.2 => /usr/lib64/libcom_err.so.2 (0x00007fb39bc28000)
    liblber-2.4.so.2 => /lib64/liblber-2.4.so.2 (0x00007fb39ba19000)
    libldap-2.4.so.2 => /lib64/libldap-2.4.so.2 (0x00007fb39b7c7000)
    libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x00007fb39b5a9000)
    liblzma.so.5 => /usr/lib64/liblzma.so.5 (0x00007fb39b385000)
    libxcb-shm.so.0 => /usr/lib64/libxcb-shm.so.0 (0x00007fb39b183000)
    libxcb-render.so.0 => /usr/lib64/libxcb-render.so.0 (0x00007fb39af7a000)
    libXrender.so.1 => /usr/lib64/libXrender.so.1 (0x00007fb39ad71000)
    libuuid.so.1 => /lib64/libuuid.so.1 (0x00007fb39ab6d000)
    libreglo.so => /home/vagrant/instdir/program/./libreglo.so (0x00007fb39a953000)
    libunoidllo.so => /home/vagrant/instdir/program/./libunoidllo.so (0x00007fb39a6f1000)
    liblangtag-lo.so.1 => /home/vagrant/instdir/program/./liblangtag-lo.so.1 (0x00007fb39a4cb000)
    libgpg-error.so.0 => /home/vagrant/instdir/program/./libgpg-error.so.0 (0x00007fb39a2b8000)
    libunistring.so.0 => /usr/lib64/libunistring.so.0 (0x00007fb399fa2000)
    libssl.so.10 => /usr/lib64/libssl.so.10 (0x00007fb399d34000)
    libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fb39994c000)
    libicuuc.so.50 => /usr/lib64/libicuuc.so.50 (0x00007fb3995d6000)
    libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x00007fb3993c7000)
    libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fb3991c4000)
    libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fb398faa000)
    libsasl2.so.2 => /usr/lib64/libsasl2.so.2 (0x00007fb398d8f000)
    libXau.so.6 => /usr/lib64/libXau.so.6 (0x00007fb398b8c000)
    libstorelo.so => /home/vagrant/instdir/program/./libstorelo.so (0x00007fb398970000)
    libicudata.so.50 => /usr/lib64/libicudata.so.50 (0x00007fb39739d000)
    libselinux.so.1 => /usr/lib64/libselinux.so.1 (0x00007fb39717c000)
    libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fb396f45000)
    libfreebl3.so => /lib64/libfreebl3.so (0x00007fb396d43000)
vladholubiev commented 6 years ago

@fengsi check out a new repo - https://github.com/vladgolubev/aws-lambda-libreoffice

It's 85 MB now, thanks to Brotli compression.

It takes ~1.5 seconds on cold start, still need to test it more, but at a first glance works well

fengsi commented 6 years ago

Quick benchmark on local AWS Linux VM:

$ time brotli -cd lo.tar.br | tar -C /tmp -xf -
0m2.340s
0m2.321s
0m2.365s
0m2.612s
0m2.441s

$ time tar -xf lo.tar.gz
0m2.857s
0m3.207s
0m2.862s
0m2.899s
0m2.854s

So the file is smaller, and also the decompression is faster. Not sure if it's best practice to pipe brotli with tar, but yeah the cold start should be faster due to smaller file and faster decompression.

Thanks! Now if only we can trim the instdir stuff more... LOL

ncruces commented 6 years ago

Are files in ./instdir/share/config/soffice.cfg necessary? Removing them seems to work, and at a glance it seems like it's UI configuration stuff.

These won't save much space, but there are 1612 files and 136 folders in there (out of 2782/248 for the whole package, so more than half). So maybe that helps?

vladholubiev commented 6 years ago

@ncruces thanks for looking into it! That might help. Working with lots of small files takes a decent chunk of CPU time. I'll benchmark it next time I have a moment

ncruces commented 6 years ago

Also, consider using zstd instead of brotli.

When turned up to 22 (literally: zstd --ultra -22, see man) I get a 85.1 MB tar. Benchmarks suggest it's faster, but YMMV (I'm running this on GCP Cloud Functions in Go).

In my experience at least, cold starts for aws-lambda-libreoffice are more painful in GCP than the ~1.5s you've mentioned above.

ncruces commented 6 years ago

Another thought. LibreOffice creates a "user installation" directory on first run, which I think contributes to cold start overhead.

Currently this is created in instdir/user. You can include this directory in your tar file, which I think shaves another second or so from cold start. Just do a simple conversion before creating the your archive.

An alternative (which is what I'm doing) is to create a separate archive for this, unpack a fresh version of it for every run, and then do: soffice -env:UserInstallation=file:///tmp/XPTO. This avoids issues with it becoming corrupted between runs.

sunnyportsmouth commented 6 years ago

First of all thanks for excellent info here. Do you happen to have the trimmed lambda up loadable version of office libre of 6.1.3? If not step by step instructions would be helpful. TIA