qpdf / qpdf

qpdf: A content-preserving PDF document transformer
https://qpdf.sourceforge.io/
Apache License 2.0
3.22k stars 261 forks source link

Building standalone executables #352

Closed ru-zar closed 3 years ago

ru-zar commented 4 years ago

Please excuse me, I don't 100% know what I'm doing and my terminology may be off.

I'm ultimately wanting to use qpdf on an AWS Lambda. For similar projects I've been able to build a standalone executable, put in a tarball, download/untar when the AWS Lambda boots up, and use it when the lambda is running.

In this particular case my hope was that I could compile qpdf on an AWS Codebuild instance, tarball the resulting qpdf/build directory, untar as described, and run build/qpdf file1.pdf file2.pdf output.pdf.

When I do so, I get an error like: "./qpdf/build/qpdf: line 202: cd: /codebuild/output/src720098561/src/github.com/qpdf/qpdf: No such file or directory" which seems to indicate that the compiled qpdf thinks it should be in the same place where it was built.

I've tried setting DESTDIR to "mybuild" but as far as I can tell this isn't changing the build. Maybe I'm missing something?

I'm looking for specific advice on how to build a standalone executable that I can move to another system, or any other kind of advice on accomplishing what I've described at the beginning here.

jberkenbilt commented 4 years ago

There are a few things that might work. If you don't need to build it yourself and are okay using a prebuilt binary, you can try the AppImage version of qpdf, which can be downloaded from the qpdf release area on github. This acts like a stand-alone executable but is actually an encapsulation of the executable and required shared libraries. You can build the app image yourself using the build-appimage script from the azure-pipelines directory, but you probably need to know what you're doing to do this.

The path to the latest released AppImage is here: https://github.com/qpdf/qpdf/releases/download/release-qpdf-8.4.2/qpdf-8.4.2-x86_64.AppImage

You could grab this file, chmod +x it, and rename it to qpdf, and it would work as a stand-alone executable, or you can build it yourself (docker is required, and you have to be able to run docker in privileged mode) and do the same thing.

Another option would be to pass --disable-shared to ./configure when you build. In that case, qpdf/build/qpdf will be mostly a stand-alone executable. It will still depend on some system libraries and the jpeg and zlib libraries. I don't know whether those will be on the vm running your lambda function. I've used lambda, but not in this way. It's almost certain that zlib will be there, but maybe not the jpeg library.

Unfortunately, there's not a really good way to generate a truly stand-alone executable without modifying the build. I tried a few things, but it doesn't really work.

I hope this give you enough of a hint. DESTDIR only affects installation, not building. Clarifying what you saw, when you build without --disable-shared, qpdf/build/qpdf is a shell script that deals with the shared library stuff. Otherwise, it's an actual executable, but it will still depend on external libraries.

Sparticuz commented 4 years ago

I'm working on getting qpdf up and running on Lambda. I've gotten qpdf to run, however, it looks like it needs libfuse.so.2. So I'm looking into pulling that file into Lambda. Will reply with my progress.

"trace": [
        "Runtime.UnhandledPromiseRejection: Error: Command failed: qpdf --encrypt 1234 1234 256 -- /tmp/input.pdf /tmp/encrypted.pdf",
        "dlopen(): error loading libfuse.so.2",
        "",
        "AppImages require FUSE to run. ",
        "You might still be able to extract the contents of this AppImage ",
        "if you run it with the --appimage-extract option. ",
        "See https://github.com/AppImage/AppImageKit/wiki/FUSE ",
        "for more information",
        "",
        "    at process.<anonymous> (/var/runtime/index.js:35:15)",
        "    at process.emit (events.js:223:5)",
        "    at process.EventEmitter.emit (domain.js:475:20)",
        "    at processPromiseRejections (internal/process/promises.js:201:33)",
        "    at processTicksAndRejections (internal/process/task_queues.js:95:32)"
    ]
Sparticuz commented 4 years ago

I got libfuse.so.2 uploaded to my Lambda function, however, now it wants me to modprobe fuse. Is there a reason why qpdf needs fuse? Is it possible to build without needing fuse? Is this an AppImage thing or a qpdf thing?

"trace": [
        "Runtime.UnhandledPromiseRejection: Error: Command failed: qpdf --encrypt 1234 1234 256 -- /tmp/input.pdf /tmp/encrypted.pdf",
        "fuse: device not found, try 'modprobe fuse' first",
        "open dir error: No such file or directory",
        "",
        "    at process.<anonymous> (/var/runtime/index.js:35:15)",
        "    at process.emit (events.js:223:5)",
        "    at process.EventEmitter.emit (domain.js:475:20)",
        "    at processPromiseRejections (internal/process/promises.js:201:33)",
        "    at processTicksAndRejections (internal/process/task_queues.js:95:32)"
    ]
jberkenbilt commented 4 years ago

It's an AppImage thing. qpdf itself does not need fuse.

Sparticuz commented 4 years ago

I'm using Nodejs 12.x and Serverless, other's might be different.

  1. Run the following script LOCALLY in the root of your lambda function. This will create a /bin and /lib folder at the root of your function. (This will need to be uploaded with your function. I will be looking into Lambda Layers as a solution, but this works too)

    wget https://github.com/qpdf/qpdf/releases/download/release-qpdf-9.1.1/qpdf-9.1.1-x86_64.AppImage -O qpdf.AppImage
    ./qpdf.AppImage --appimage-extract
    cp -R squashfs-root/usr/bin/* ./bin/
    cp -R squashfs-root/usr/lib/* ./lib/
    rm -rf squashfs-root qpdf.AppImage
  2. Add the following code to your handler. (Lambda already adds ${LAMBDA_TASK_ROOT}/lib to LD_LIBRARY_PATH, just need to add ${process.env.LAMBDA_TASK_ROOT}/bin to PATH)

    if(process.env.LAMBDA_TASK_ROOT) {
    process.env.PATH = `${process.env.PATH}:${process.env.LAMBDA_TASK_ROOT}/bin`;
    } else {
    process.env.PATH = `${process.env.PATH}:./bin`;
    }
  3. Call qpdf as normal.

Sparticuz commented 4 years ago

I've released a makefile that will create the needed package for upload to the Layers console. Just need to run the makefile, it will create the PDF ZIP that can be uploaded. Also, no extra code needed to add any paths, just upload, attach the layer, and run qpdf as normal.

https://github.com/Sparticuz/qpdf-aws-lambda

jberkenbilt commented 4 years ago

@Sparticuz Thanks for this. I can incorporate this into the next qpdf release and just have my CI create a lambda layer zip file as one of the officially distributed files. I will credit you for the contribution. My version will be a little more precise in that it will only take from bin the things that are actually installed by qpdf (not less, busybox, etc.). Also, your Makefile is missing a chmod +x on the AppImage file. But the main idea here of using the AppImage as a way to get a binary with all the required libraries is worth keeping, and it would be a small add-on to the AppImage build.

Sparticuz commented 4 years ago

Fine with me. I wasn't sure what was needed by qpdf, that's why I just used the whole thing. Same with /lib. You can also create a globally shareable layer in Lambda, but I'm not sure it's worth the trouble unless it could be automated in the CI.

Sparticuz commented 4 years ago

Also note: the symlink in the zip file ends up being just a duplicate of the file, not sure if that's addressable in the zip package or not? (libqpdf.so.28 -> libqpdf.so.28.0.1)

jberkenbilt commented 4 years ago

Noted about the symlink. Maybe I'll play around with that, but if not, I'll just make the file be libqpdf.so.28, which is what the loader actually looks for.

jberkenbilt commented 4 years ago

You can also create a globally shareable layer in Lambda, but I'm not sure it's worth the trouble unless it could be automated in the CI.

Yeah, I don't think I'll do that....I don't want the responsibility of having an object in AWS that unknown people may be depending on.

jberkenbilt commented 3 years ago

@ru-zar @Sparticuz Starting in the next version (probably 10.0.2), a distribution called qpdf-<version>-bin-linux-x86_64.zip is created along with the other distributions. This works as a lambda layer -- I tested it. Thanks for this suggestion. I basically added about four lines of code to the appimage creation script to grab the relevant files at the same time and put them in a zip file.

@Sparticuz Thanks for your idea to grab stuff from the AppImage. After 10.0.2 is released, you won't need your special Makefile anymore -- you can directly grab the zip file right from qpdf releases.

aaronbrezel commented 1 year ago

Hello @jberkenbilt,

Thanks so much for putting the time in to make qpdf a standalone binary. I am attempting to add qpdf as a layer in a python3.8 lambda runtime deployed through serverless and am running into some issues. I was hoping you might have some ideas

First the error thrown by the lambda:

qpdf: error while loading shared libraries: /opt/lib/libqpdf.so.29: file too short

This points to an error setting up the qpdf layer.

We download the layer using curl:

curl -L -o qpdf-11.2.0-bin-linux-x86_64.zip https://github.com/qpdf/qpdf/releases/download/v11.2.0/qpdf-11.2.0-bin-linux-x86_64.zip

and unzip the file, creating a directory structure:

layer/
- bin/
- lib/

We then add the layer/ to our lambda.

However, when we attempt to execute qpdf through a python subprocess, but are met with the error above.

Are we missing something important when constructing the layer?

Sparticuz commented 1 year ago

You may need to make sure that LD_LIBRARY_PATH includes /opt/lib

aaronbrezel commented 1 year ago

You may need to make sure that LD_LIBRARY_PATH includes /opt/lib

Thanks for the quick reply! We do have that env variable set on our lambda.

image

Unfortunately, I still see the same error

Sparticuz commented 1 year ago

actually, i'm re-reading your post. You don't need to unzip the qpdf zip, just upload it directly as the zip you downloaded. File too short sounds like the .so file is busted.

EDIT:

curl -L -o qpdf-11.2.0-bin-linux-x86_64.zip https://github.com/qpdf/qpdf/releases/download/v11.2.0/qpdf-11.2.0-bin-linux-x86_64.zip
aws lambda publish-layer-version --layer-name qpdf --description "qpdf v 11.2.0" --zip-file qpdf-11.2.0-bin-linux-x86_64.zip --compatible-architectures x86_64

I think that should work and return you an arn to use on sls

functions:
  qpdf-function:
...
    layers:
      - arn:aws:lambda:us-east-1:***************:layer:qpdf:##
...
jberkenbilt commented 1 year ago

I have this in my work environment:

qpdf_version=11.2.0
curl -L https://github.com/qpdf/qpdf/releases/download/v${qpdf_version}/qpdf-${qpdf_version}-bin-linux-x86_64.zip -o layer.zip
aws lambda publish-layer-version \
    --layer-name qpdf11-layer \
    --zip-file fileb://layer.zip

I don't personally use serverless (preferring to control infrastructure with terraform and then use the AWS cli or scripts to do deploys), but imagine you could add the layer as a resource in your serverless.yml. I don't use cloudformation very often, so I don't have the syntax in my mind for lambda layers. I don't control my lambda layer versions in terraform because of the way layers work. Instead, I just publish them from the CLI in my deployment scripts and have a separate job that prunes outdated layer versions.

But as other people pointed out, the zip file is intended to be used as is as a layer. If you wanted to combine it with another layer, you could extract it into the root of whatever you are zipping.

aaronbrezel commented 1 year ago

Publishing the zip file as a layer appears to do the trick! Thank you for the help, guys. It's much appreciated 🙏

dakab1 commented 12 months ago

hi

I get the following error: qpdf: /lib64/libc.so.6: version `GLIBC_2.25' not found (required by /opt/lib/libgnutls.so.30)

Using you suggestion here, any suggestions would be welcome :-)

I have this in my work environment:

qpdf_version=11.2.0
curl -L https://github.com/qpdf/qpdf/releases/download/v${qpdf_version}/qpdf-${qpdf_version}-bin-linux-x86_64.zip -o layer.zip
aws lambda publish-layer-version \
    --layer-name qpdf11-layer \
    --zip-file fileb://layer.zip

I don't personally use serverless (preferring to control infrastructure with terraform and then use the AWS cli or scripts to do deploys), but imagine you could add the layer as a resource in your serverless.yml. I don't use cloudformation very often, so I don't have the syntax in my mind for lambda layers. I don't control my lambda layer versions in terraform because of the way layers work. Instead, I just publish them from the CLI in my deployment scripts and have a separate job that prunes outdated layer versions.

But as other people pointed out, the zip file is intended to be used as is as a layer. If you wanted to combine it with another layer, you could extract it into the root of whatever you are zipping.

jberkenbilt commented 11 months ago

@dakab1 Please create a new issue. Comments on closed issues are likely to be overlooked. It's possible that the binary images of recent qpdf builds may use libraries that are too new to work unmodified on docker. If that's the case, I will need to change my binary distribution build.

dakab1 commented 11 months ago

Thanks for your reply. Done Regards

Dean

Cell : +27 (79) 501 6916

On Mon, Jul 17, 2023 at 11:21 PM Jay Berkenbilt @.***> wrote:

@dakab1 https://github.com/dakab1 Please create a new issue. Comments on closed issues are likely to be overlooked. It's possible that the binary images of recent qpdf builds may use libraries that are too new to work unmodified on docker. If that's the case, I will need to change my binary distribution build.

— Reply to this email directly, view it on GitHub https://github.com/qpdf/qpdf/issues/352#issuecomment-1638899767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAUM4ODQIXIJPOLZ7TZXUMDXQWUEXANCNFSM4IOPXNGQ . You are receiving this because you were mentioned.Message ID: @.***>