wkhtmltopdf / packaging

Packaging of wkhtmltopdf releases
https://wkhtmltopdf.org/downloads.html#stable
305 stars 138 forks source link

AWS Lambda layer zip #77

Closed deniszatsepin closed 3 years ago

deniszatsepin commented 3 years ago

In accordance with a long discussion in https://github.com/wkhtmltopdf/wkhtmltopdf/issues/4523 @ashkulz suggested automizing wkhtmltopdf image generation suitable for using in AWS Lambda environment. Here I've introduced a new property for docker based build configuration, called postcompile, where it's possible to specify the shell script which will be executed inside the container at the end of the compilation process. It's also possible to specify zip as an output format, which is suitable for the AWS Lambda layers. In the end, the zip file will be generated. To be able to use it in the AWS Lambda environment, one should unpack it in an appropriate directory or directly use zip file as a lambda layer. It's not tested on AWS Lambda yet. I'm going to do it tomorrow. Comments, suggestions are warmly welcomed!

deniszatsepin commented 3 years ago

I've tested the generated zip file as a layer for AWS Lambda. It works as expected, generates valid pdf files.

deniszatsepin commented 3 years ago

@ashkulz thanks for the review. Could you please explain, how the azure pipeline is executed (by whom?) and where to find the results of the pipeline?

ashkulz commented 3 years ago

@deniszatsepin the Azure Pipelines is run manually by me. You can see the results here.

annshress commented 3 years ago

How long does it take to package for the amazon lambda layer?

ashkulz commented 3 years ago

@annshress it generally takes 40 minutes, but I'll have to make a special release for that. I'll be first checking it locally.

deniszatsepin commented 3 years ago

How long does it take to package for the amazon lambda layer?

Locally, I have a resulting zip archive in about 3 minutes.

ashkulz commented 3 years ago

@deniszatsepin is that just the .zip part? Because the Qt compilation should take much longer.

annshress commented 3 years ago

Locally, I have a resulting zip archive in about 3 minutes.

I am currently packaging this. And its taking 30 minutes approx and still going.

E: Yep around 40 minutes.

annshress commented 3 years ago

/opt/wkhtmltox/bin/wkhtmltopdf: error while loading shared libraries: libjpeg.so.62: cannot open shared object file: No such file or directory

@deniszatsepin Any idea?

Currently in the wkhtmltox layer, we have

$ ls /opt/wkhtmltox/lib/
libbz2.so.1
libexpat.so.1
libfontconfig.so.1
libfreetype.so.6
libjpeg.so.62
libpng15.so.15
libuuid.so.1
libX11.so.6
libXau.so.6
libxcb.so.1
libXext.so.6
libXrender.so.1
deniszatsepin commented 3 years ago

@ashkulz @annshress Sorry for my misleading estimation. The whole process for amazonlinux2_lambda target takes 13 mins on my laptop. The packaging itself is about 20 secs. (Intel® Core™ i7-8750H CPU @ 2.20GHz × 12, 15,5 GiB)

deniszatsepin commented 3 years ago

@annshress

Do you run it in AWS Lambda environment or locally?

annshress commented 3 years ago

@deniszatsepin

I invoked it. Somehow the binary isnt discovering the .so

annshress commented 3 years ago

@deniszatsepin

.so dependencies

    linux-vdso.so.1 (0x00007ffe45ca7000)
    libjpeg.so.62 => not found
    libpng15.so.15 => not found
    libXrender.so.1 => not found
    libfontconfig.so.1 => not found
    libfreetype.so.6 => not found
    libXext.so.6 => not found
    libX11.so.6 => not found
    libssl.so.10 => /lib64/libssl.so.10 (0x00007f65cf999000)
    libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f65cf544000)
    libz.so.1 => /lib64/libz.so.1 (0x00007f65cf32f000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00007f65cf12b000)
...
deniszatsepin commented 3 years ago

@annshress This package is specifically for AWS Lambda environment. If you will use the contents of the zip file as a layer, then shared libraries in lib directory will be used by the operating system. The contents of the zip file should appear in the /opt directory itself. Paths should be /opt/bin/wkhtmltopdf, /opt/lib/..., /opt/fonts/....

It will also work if you put it in the root of a Lambda function, so paths will look like that: /var/task/bin/wkhtmltopdf, /var/task/lib/..., /var/task/fonts/.... I prefer the option with a layer.

annshress commented 3 years ago

@deniszatsepin Thank you for making packaging easier.

ashkulz commented 3 years ago

So it was generated locally:

$ unzip -l wkhtmltox-0.12.7-0.20200922.15.dev.36c0e9a.amazonlinux2_lambda.zip 
Archive:  wkhtmltox-0.12.7-0.20200922.15.dev.36c0e9a.amazonlinux2_lambda.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2020-09-22 14:34   wkhtmltox/
        0  2020-09-22 14:34   wkhtmltox/bin/
 39534592  2020-09-22 14:33   wkhtmltox/bin/wkhtmltopdf
 39460864  2020-09-22 14:34   wkhtmltox/bin/wkhtmltoimage
        0  2020-09-22 14:34   wkhtmltox/fonts/
        0  2020-09-22 14:34   wkhtmltox/fonts/dejavu/
   576004  2020-09-22 14:34   wkhtmltox/fonts/dejavu/DejaVuSansCondensed-Oblique.ttf
   611556  2020-09-22 14:34   wkhtmltox/fonts/dejavu/DejaVuSans-Oblique.ttf
   345212  2020-09-22 14:34   wkhtmltox/fonts/dejavu/DejaVuSans-ExtraLight.ttf
   672300  2020-09-22 14:34   wkhtmltox/fonts/dejavu/DejaVuSans-Bold.ttf
   611212  2020-09-22 14:34   wkhtmltox/fonts/dejavu/DejaVuSans-BoldOblique.ttf
   631992  2020-09-22 14:34   wkhtmltox/fonts/dejavu/DejaVuSansCondensed-Bold.ttf
       36  2020-09-22 14:34   wkhtmltox/fonts/dejavu/.uuid
   580168  2020-09-22 14:34   wkhtmltox/fonts/dejavu/DejaVuSansCondensed-BoldOblique.ttf
   720012  2020-09-22 14:34   wkhtmltox/fonts/dejavu/DejaVuSans.ttf
   643852  2020-09-22 14:34   wkhtmltox/fonts/dejavu/DejaVuSansCondensed.ttf
       36  2020-09-22 14:34   wkhtmltox/fonts/.uuid
      203  2020-09-22 14:34   wkhtmltox/fonts/fonts.conf
        0  2020-09-22 14:34   wkhtmltox/lib/
   179208  2020-09-22 14:34   wkhtmltox/lib/libpng15.so.15
    40552  2020-09-22 14:34   wkhtmltox/lib/libXrender.so.1
   165784  2020-09-22 14:34   wkhtmltox/lib/libxcb.so.1
    20056  2020-09-22 14:34   wkhtmltox/lib/libuuid.so.1
   272736  2020-09-22 14:34   wkhtmltox/lib/libfontconfig.so.1
   272904  2020-09-22 14:34   wkhtmltox/lib/libjpeg.so.62
  1318736  2020-09-22 14:34   wkhtmltox/lib/libX11.so.6
   758536  2020-09-22 14:34   wkhtmltox/lib/libfreetype.so.6
    15432  2020-09-22 14:34   wkhtmltox/lib/libXau.so.6
    75816  2020-09-22 14:34   wkhtmltox/lib/libXext.so.6
    68128  2020-09-22 14:34   wkhtmltox/lib/libbz2.so.1
   206168  2020-09-22 14:34   wkhtmltox/lib/libexpat.so.1
---------                     -------
 87782095                     31 files

@annshress can you confirm if it works for you?

ashkulz commented 3 years ago

@deniszatsepin is there a need to change the paths? I'd prefer if we did that automatically rather than having to document it.

deniszatsepin commented 3 years ago

@ashkulz Yes, let me change the archiving part, so bin, lib, and fonts will be in the root and not a subdirectory of wkhtmltox.

annshress commented 3 years ago

@annshress can you confirm if it works for you?

@ashkulz Yes it does. I yet need to check the output file though.

ashkulz commented 3 years ago

@deniszatsepin / @annshress: can you confirm it works for you properly? I tried to reproduce this in docker and it generated the file, but with a fontconfig error:

$ docker run --rm -it -v$PWD/lambda:/lambda amazonlinux:2
bash-4.2# LD_LIBRARY_PATH=/lambda/lib /lambda/bin/wkhtmltopdf https://google.com/ /lambda/google.pdf
Fontconfig error: Cannot load default config file
Loading pages (1/6)
Counting pages (2/6)                                               
Resolving links (4/6)                                                       
Loading headers and footers (5/6)                                           
Printing pages (6/6)
Done                                                                      

Also, the output google.pdf didn't have any fonts in it :man_shrugging:

deniszatsepin commented 3 years ago

@ashkulz try to use this command:

$ docker run --rm -it -v$PWD/lambda:/opt amazonlinux:2
bash-4.2# LD_LIBRARY_PATH=/opt/lib FONTCONFIG_PATH=/opt/fonts /opt/bin/wkhtmltopdf https://google.com/ /opt/google.pdf

fontconfig configured to look for fonts in /opt/fonts or /vart/task/fonts

ashkulz commented 3 years ago

That's much better, thanks!

$ docker run --rm -it -v$PWD/foo:/opt amazonlinux:2
bash-4.2# LD_LIBRARY_PATH=/opt/lib FONTCONFIG_PATH=/opt/fonts /opt/bin/wkhtmltopdf https://google.com/ /opt/google.pdf

gives me google.pdf which still has some missing fonts. I guess that's because the Indic text requires additional fonts? May need to add that in the FAQ.

ashkulz commented 3 years ago

Thanks for the contribution, @deniszatsepin! I'll start a build and update the downloads page by tomorrow.

deniszatsepin commented 3 years ago

Many thanks for the collaboration guys! Ping me if something needs to be improved in lambda build.

deniszatsepin commented 3 years ago

Hey @ashkulz, how is it going with the new release and downloads page update? May I probably help somehow?

ashkulz commented 3 years ago

@deniszatsepin I'll do that today, didn't have access to the laptop which has the gpg keys and later on forgot about it :see_no_evil:

deniszatsepin commented 3 years ago

Yeah, it happens. I'm looking forward to starting using the new image in my companies project and plan to write an article about pdf generation with lambda and wkhtmltopdf. So, it would be awesome if it will be released. Thank you!

ashkulz commented 3 years ago

@deniszatsepin I've made the release in this repository. Would appreciate a PR for the downloads page, as I think the FAQ needs to be changed as well ...

deniszatsepin commented 3 years ago

@ashkulz awesome news! Thank you! Regarding the downloads page and FAQ. Could you please point me to the particular file which should be changed?

ashkulz commented 3 years ago

It's in the wkhtmltopdf repository, docs/downloads.md.

ashkulz commented 3 years ago

@deniszatsepin if you don't have time, I'll just add the AWS Lambda zip to the downloads page, not sure what to do about the FAQ though.

deniszatsepin commented 3 years ago

@ashkulz, sorry. I have time for that today. I'll do it.