samoconnor / lambdalatex

Latex TeX Live for AWS Lambda
MIT License
31 stars 7 forks source link

Does not work in Amazon Linux 2018.03 OS version #3

Open suriya opened 5 years ago

suriya commented 5 years ago

In https://aws.amazon.com/blogs/compute/upcoming-updates-to-the-aws-lambda-execution-environment/ AWS announced an upgrade to the Lambda execution environment from Amazon Linux version 2017.03 to version 2018.03.

lambdalatex does not work in the newer environment 2018.03. The latexmk command in lambdalatex needs perl. However, /usr/bin/perl which was present in 2017.03 Lambda images is removed from 2018.03 images.

To use lambdalatex we need to come up with a way to make perl available in the Lambda function. I tried the layer mentioned in https://github.com/moznion/aws-lambda-perl5-layer However, I got signal 11 while running latexmk. It is possible that that layer is built against 2017.03. I tried to build my own layer but got exit code 127 while invoking latexmk --version. I am not an expert in perl. Nor am I an expert in the texlive distribution. I am unable to make further progress.

Do you have any thoughts on how to make lambdalatex work on Amazon Linux 2018.03 OS version?

johnstrickler commented 5 years ago

That may explain why I'm getting an error that "document.pdf" is not found.

/usr/bin/env: perl: No such file or directory

{
  "errorMessage": "[Errno 2] No such file or directory: 'document.pdf'",
  "errorType": "FileNotFoundError",
  "stackTrace": [
    [
      "/var/task/lambda_function.py",
      47,
      "lambda_handler",
      "event['output_key'])"
    ],
wzard commented 5 years ago

Ran into the same issue. Previous Lambda Function works just fine. @suriya Were you able to figure out how to add perl supprt to the new env?

suriya commented 5 years ago

@wzard I have not made further progress.

On Tue, Jun 25, 2019, 2:47 AM Siddharth Kanungo notifications@github.com wrote:

Ran into the same issue. Previous Lambda Function works just fine. @suriya https://github.com/suriya Were you able to figure out how to add perl supprt to the new env?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/samoconnor/lambdalatex/issues/3?email_source=notifications&email_token=AAAKS5R7NJJV567X6ZGQQNTP4E2X3A5CNFSM4HR4IEAKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYOIGTA#issuecomment-505185100, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAKS5TKMQQWID2OLVWPV5TP4E2X3ANCNFSM4HR4IEAA .

Karandaras commented 5 years ago

@wzard @suriya same problem here, but I think I am on the right track I built a custom Perl Layer based of the linked perl layer that seems to work

here is my Dockerfile for it:

FROM lambci/lambda:build

ARG PERL_VERSION
RUN yum install -y zip curl
RUN curl -L https://raw.githubusercontent.com/tokuhirom/Perl-Build/master/perl-build > /tmp/perl-build
RUN perl /tmp/perl-build ${PERL_VERSION} /opt/ -des -Dcf_by="Red Hat, Inc." -Darchname=x86_64-linux-thread-multi -Dusethreads -Duseithreads -Dusesitecustomize

WORKDIR /opt

And then just build and package the stuff:

docker build --rm --build-arg PERL_VERSION=5.16.3 -t perllayer .
docker run --rm -it -v "YOUR OUTPUT DIRECTORY HERE":/var/host perllayer zip --symlinks -r -9 /var/host/PerlLayer.zip .

Tried some simple cases and it did the job, but needs some more testing after the weekend. Feel free to try it out.

Have a nice weekend

nhoffman commented 5 years ago

@samoconnor - thanks a lot for this - it was just what I needed. And thanks to @Karandaras for the comment above. I can confirm that adding the perl runtime in this way works. Layers would be better, but in the short term I just added the perl dependencies directly to the same zip file. My minor modifications:

Remove /man from the docker image to reduce the zip file size:

RUN rm -r /opt/man

and include the paths to the perl executable and libs in the script:

    os.environ['PATH'] += ":/var/task/bin/"
    os.environ['PATH'] += ":/var/task/texlive/2019/bin/x86_64-linux/"

    os.environ['PERL5LIB'] = '/var/task/lib/perl5/5.16.3/'
    os.environ['PERL5LIB'] += ":/var/task/texlive/2019/tlpkg/TeXLive/"

As an aside, the approach used in this project to install texlive does not pin to a version, so the user needs to be sure to update the path to texlive accordingly. In my version I'll plan to parameterize both this and the perl version.

johnstrickler commented 5 years ago

Thanks @Karandaras, I was able to get my lambda function to work again. Also thanks @nhoffman for the optimizations.

My lambda execution time was 7.6 seconds using 128mb for a simple latex document. Not too shabby.

suriya commented 5 years ago

@Karandaras Worked for me as well. Thank you!

kevcam4891 commented 4 years ago

Thanks to all for this support in this area specifically. It took me about 3 hours to go from git clone to producing a PDF of my own in lambda/latex. The perl issue was certainly the toughest nut to crack, particularly because you can go about it in a number of ways (standalone layer, include in the latex lambda itself) but this guidance was very helpful.

Including both latex and perl in one lambda is difficult. I already had to strip down latex to NO extra packages, etc to get everything in under the 50MB limit. I'm thinking splitting this up into Perl in its own layer will ultimately be the way to go.

To anyone that wants to just get going with Latex/Perl together to prove out the concept, I'm attaching a Docker file below. It is simply a cookbook that is based on all the helpful comments above. It's can probably be optimized even more, but this "works":

Dockerfile

FROM lambci/lambda:build-python3.6

# Install Perl
ARG PERL_VERSION
RUN yum install -y zip curl
RUN curl -L https://raw.githubusercontent.com/tokuhirom/Perl-Build/master/perl-build > /tmp/perl-build
RUN perl /tmp/perl-build ${PERL_VERSION} /opt/ -des -Dcf_by="Red Hat, Inc." -Darchname=x86_64-linux-thread-multi -Dusethreads -Duseithreads -Dusesitecustomize

# The TeXLive installer needs md5 and wget.
RUN yum -y install perl-Digest-MD5 && \
    yum -y install wget

RUN mkdir /var/src
WORKDIR /var/src

# Download TeXLive installer.
ADD http://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz /var/src/
#RUN pwd && ls -lah /var/src
#COPY install-tl-unx.tar.gz /var/src/

# Minimal TeXLive configuration profile.
COPY texlive.profile /var/src/

# Intstall base TeXLive system.
RUN tar xf install*.tar.gz
RUN cd install-tl-* && \
    ./install-tl --profile ../texlive.profile
    # --location http://ctan.mirror.norbert-ruehl.de/systems/texlive/tlnet

ENV PATH=/var/task/texlive/2017/bin/x86_64-linux/:$PATH

# Install extra packages.
#RUN tlmgr install xcolor \
#                  tcolorbox \
#                  pgf \
#                  environ \
#                  trimspaces \
#                  etoolbox \
#                  booktabs \
#                  lastpage \
#                  pgfplots \
#                  marginnote \
#                  tabu \
#                  varwidth \
#                  makecell \
#                  enumitem \
#                  setspace \
#                  xwatermark \
#                  catoptions \
#                  ltxkeys \
#                  framed \
#                  parskip \
#                  endnotes \
#                  footmisc \
#                  zapfding \
#                  symbol \
#                  lm \
#                  sectsty \
#                  stringstrings \
#                  koma-script \
#                  multirow \
#                  calculator \
#                  adjustbox \
#                  xkeyval \
#                  collectbox \
#                  siunitx \
#                  l3kernel \
#                  l3packages \
#                  helvetic \
#                  charter

# Install latexmk.
RUN tlmgr install latexmk

# Remove LuaTeX.
RUN tlmgr remove --force luatex

# Remove large unneeded files.
RUN rm -rf /var/task/texlive/2017/tlpkg/texlive.tlpdb* \
           /var/task/texlive/2017/texmf-dist/source/latex/koma-script/doc \
           /var/task/texlive/2017/texmf-dist/doc 

RUN mkdir -p /var/task/texlive/2017/tlpkg/TeXLive/Digest/ && \
    mkdir -p /var/task/texlive/2017/tlpkg/TeXLive/auto/Digest/MD5/ && \
    cp /usr/lib64/perl5/vendor_perl/Digest/MD5.pm \
      /var/task/texlive/2017/tlpkg/TeXLive/Digest/ && \
    cp /usr/lib64/perl5/vendor_perl/auto/Digest/MD5/MD5.so \
      /var/task/texlive/2017/tlpkg/TeXLive/auto/Digest/MD5

# Remove perl libraries that don't get used so we can get under the 50 MB limit
RUN rm -rf /opt/lib/perl5/5.16.3/x86_64-linux-thread-multi/auto/Encode/CN
RUN rm -rf /opt/lib/perl5/5.16.3/x86_64-linux-thread-multi/auto/Encode/JP
RUN rm -rf /opt/lib/perl5/5.16.3/x86_64-linux-thread-multi/auto/Encode/KR
RUN rm -rf /opt/lib/perl5/5.16.3/x86_64-linux-thread-multi/auto/Encode/TW

FROM lambci/lambda:build-python3.6

WORKDIR /var/task

ENV PATH=/var/task/texlive/2017/bin/x86_64-linux/:/opt/:$PATH
ENV PERL5LIB=/var/task/texlive/2017/tlpkg/TeXLive/

COPY --from=0 /var/task/ /var/task/
COPY --from=0 /opt/bin/perl /var/task/bin/
COPY --from=0 /opt/lib /var/task/lib
COPY lambda_function.py /var/task

RUN ls -lah /var/task

lambda_function.py Mostly like the author's, but adding the extra PERL5LIB and PATH paths.

import os
import io
import shutil
import subprocess
import base64
import zipfile
import boto3

def lambda_handler(event, context):

    # Extract input ZIP file to /tmp/latex...
    shutil.rmtree("/tmp/latex", ignore_errors=True)
    os.mkdir("/tmp/latex")

    print(event)

    if 'input_bucket' in event:
        r = boto3.client('s3').get_object(Bucket=event['input_bucket'],
                                          Key=event['input_key'])
        bytes = r["Body"].read()
    else:
        bytes = base64.b64decode(event["input"])

    z = zipfile.ZipFile(io.BytesIO(bytes))
    z.extractall(path="/tmp/latex")

    os.environ['PATH'] += ":/var/task/bin"
    os.environ['PATH'] += ":/var/task/texlive/2017/bin/x86_64-linux/"
    os.environ['HOME'] = "/tmp/latex/"

    os.environ['PERL5LIB'] = "/var/task/lib/perl5/5.16.3/"
    os.environ['PERL5LIB'] += ":/var/task/texlive/2017/tlpkg/TeXLive/"

    os.chdir("/tmp/latex/")

    # Run pdflatex...
    r = subprocess.run(["latexmk",
                        "-verbose",
                        "-interaction=batchmode",
                        "-pdf",
                        "-output-directory=/tmp/latex",
                        "document.tex"],
                       stdout=subprocess.PIPE,
                       stderr=subprocess.STDOUT)
    print(r.stdout.decode('utf_8'))

    if "output_bucket" in event:
        boto3.client('s3').upload_file("document.pdf",
                                       event['output_bucket'],
                                       event['output_key'])
        return {
            "stdout": r.stdout.decode('utf_8')
        }

    else:
        # Read "document.pdf"...
        with open("document.pdf", "rb") as f:
            pdf = f.read()

        # Return base64 encoded pdf and stdout log from pdflaxex...
        return {
            "output": base64.b64encode(pdf).decode('ascii'),
            "stdout": r.stdout.decode('utf_8')
        }

Test Event: Once you've uploaded your lambda, in the lambda console, paste this into "Configure test event" panel.

{
  "input": "UEsDBBQACAAIAHM5JFEAAAAAAAAAAFgAAAAMACAAZG9jdW1lbnQudGV4VVQNAAfqIFJfOiNSX/5NUl91eAsAAQT1AQAABBQAAACLSclPLs1NzStJzkksLo42NCwo0clJLSlJLSpILEgtiq1OLCrJTM5JreWKSUpNz8yrhqmv5fJIzcnJVwjPL8pJUeSKSc1LQZLjAgBQSwcIlMFj5UsAAABYAAAAUEsBAhQDFAAIAAgAczkkUZTBY+VLAAAAWAAAAAwAIAAAAAAAAAAAAKSBAAAAAGRvY3VtZW50LnRleFVUDQAH6iBSXzojUl/+TVJfdXgLAAEE9QEAAAQUAAAAUEsFBgAAAAABAAEAWgAAAKUAAAAAAA=="
}
lpinilla commented 2 years ago

I would like to point out a quicker solution:

Using the layer from ARN arn:aws:lambda:us-east-1:445285296882:layer:perl-5-34-runtime-al2-x86_64:4 (grabbed from here) works.

The input example of the repo took about 1 min to run, I don't know if it's the layer's fault or python but it worked.

I would like to also point out that this arn layer is from the internet and you shouldn't trust it blindly despite the fact that it works, please don't use it for sensitive documents and for a long-term solution, build your own perl layer.