okigan / awscurl

curl-like access to AWS resources with AWS Signature Version 4 request signing.
MIT License
737 stars 91 forks source link

Compile to binary #128

Open speller opened 2 years ago

speller commented 2 years ago

It would be nice to have a possibility to compile awscurl to a binary for optimal disk space usage especially in Docker containers. It's hard to pull all the dependencies required for the program to work. Particularly, I'm building an image for CI/CD that will have awscurl installed.

okigan commented 2 years ago

@speller please add more info what/which dependency is causing issues.

speller commented 2 years ago

@speller please add more info what/which dependency is causing issues.

I'm not proficient in Python, so I can't say what's missing. I don't know anything about compiling Python programs to binaries. But having a binary pushed to GitHub releases would be super useful. And it also would be super nice if it will work under the Alpine linux.

okigan commented 2 years ago

The docket images are actively used and are built based on alpine. See this: https://github.com/okigan/awscurl/blob/master/Dockerfile

On Sep 4, 2021, at 11:43 PM, Alexander Pravdin @.***> wrote:

 @speller please add more info what/which dependency is causing issues.

I'm not proficient in Python, so I can't say what's missing. I don't know anything about compiling Python programs to binaries. But having a binary pushed to GitHub releases would be super useful. And it also would be super nice if it will work under the Alpine linux.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

speller commented 2 years ago

What's the binary path then? The entrypoint says that it is run as a python script, not as a binary as I understood: ENTRYPOINT ["python", "-m", "awscurl.awscurl"]

speller commented 2 years ago

Could you let me know what files should I copy from the awscurl docker image to make it working locally on another Alpine-based image?

speller commented 2 years ago

I've managed to compile by myself. Here is my Dockerfile code that builds awscli and awscurl. The awscurl part shares almost everything from the awscli setup process. I had no time to strip the awscli part to leave awscurl only. The difficulty is to get a pyinstaller Alpine bootstrap binary which doesn't exists by default, that's why all these workarounds were made (aws-cli v2 doesn't have an official Alpine image).

# AWS CLI installation based on https://github.com/aws/aws-cli/issues/4685#issuecomment-829600284
ARG PYTHON_VERSION
ARG ALPINE_VERSION
ARG DOCKER_VERSION

FROM python:${PYTHON_VERSION}-alpine${ALPINE_VERSION} AS installer

RUN apk add --no-cache \
    curl \
    unzip \
    gcc \
    git \
    libc-dev \
    libffi-dev \
    openssl-dev \
    py3-pip \
    zlib-dev \
    make \
    cmake

ARG AWSCLI_VERSION
RUN git clone --recursive  --depth 1 --branch ${AWSCLI_VERSION} --single-branch https://github.com/aws/aws-cli.git \
    && cd /aws-cli \
    # Follow https://github.com/six8/pyinstaller-alpine to install pyinstaller on alpine
    && pip install --no-cache-dir --upgrade pip \
    && pip install --no-cache-dir pycrypto \
    && git clone --depth 1 --single-branch --branch v$(grep PyInstaller requirements-build.txt | cut -d'=' -f3) https://github.com/pyinstaller/pyinstaller.git /tmp/pyinstaller \
    && cd /tmp/pyinstaller/bootloader \
    && CFLAGS="-Wno-stringop-overflow -Wno-stringop-truncation" python ./waf configure --no-lsb all \
    && pip install .. \
    && rm -Rf /tmp/pyinstaller \
    && cd - \
    && boto_ver=$(grep botocore setup.cfg | cut -d'=' -f3) \
    && git clone --single-branch --branch v2 https://github.com/boto/botocore /tmp/botocore \
    && cd /tmp/botocore \
    && git checkout $(git log --grep $boto_ver --pretty=format:"%h") \
    && pip install . \
    && rm -Rf /tmp/botocore  \
    && cd - \
    && sed -i '/botocore/d' requirements.txt \
    && scripts/installers/make-exe \
    && unzip dist/awscli-exe.zip \
    && ./aws/install --bin-dir /aws-cli-bin

COPY awscurl/cli.py /awscurl-cli.py
ARG AWSCURL_VERSION
RUN cd / \
    && git clone --recursive  --depth 1 --branch v${AWSCURL_VERSION} --single-branch https://github.com/okigan/awscurl \
    && cd /awscurl \
    && pip install configargparse \
    && pip install requests \
    && cp /awscurl-cli.py cli.py \
    && pyinstaller cli.py --onefile --hidden-import=configargparser --hidden-import=requests --name awscurl
...

FROM docker:${DOCKER_VERSION}
...
COPY --from=installer /usr/local/aws-cli/ /usr/local/aws-cli/
COPY --from=installer /aws-cli-bin/ /usr/local/bin/
COPY --from=installer /awscurl/dist/awscurl /usr/local/bin/awscurl

Versions:

DOCKER_VERSION=20.10.8
AWSCLI_VERSION=2.2.32
AWSCURL_VERSION=0.24
PYTHON_VERSION=3.9.7
ALPINE_VERSION=3.14

The new entrypoint file cli.py is pretty standard:

from awscurl.__main__ import main

if __name__ == "__main__":
    main()

I guess you will need the requirements-build.txt file from awscli just for setup purposes if making awscurl-only Dockerfile.

speller commented 1 year ago

The following Dockerfile code is used to compile binary under Python 3.9 Alpine 3.16

ARG PYTHON_VERSION
ARG DOCKER_VERSION

FROM python:${PYTHON_VERSION} AS installer

RUN set -ex; \
    apk add --no-cache \
    git \
    unzip \
    groff \
    curl \
    build-base \
    libffi-dev \
    cmake

COPY awscurl/cli.py /awscurl-cli.py
ARG AWSCURL_VERSION
RUN set -eux \
    && cd / \
    && git clone --recursive  --depth 1 --branch v${AWSCURL_VERSION} --single-branch https://github.com/okigan/awscurl \
    && cd /awscurl \
    && pip install configargparse \
    && pip install requests \
    && pip install pyinstaller==4.10 \
    && cp /awscurl-cli.py cli.py \
    && pyinstaller cli.py --onefile --hidden-import=configargparser --hidden-import=requests --name awscurl

FROM docker:${DOCKER_VERSION}

COPY --from=installer /awscurl/dist/awscurl /usr/local/bin/awscurl

Versions:

DOCKER_VERSION=20.10.8
AWSCURL_VERSION=0.26
PYTHON_VERSION=3.9-alpine3.16

This allows adding only binary to my image without pulling Python and raw sources. @okigan would you consider adding binaries only to the docker build instead of sources? It doesn't make sense to pull Python when only awscurl is required. And it also will simplify adding awscurl to custom docker images. Saving images' size as much as possible makes sense in deployment pipelines where many images are downloaded often, and bigger images slow down the whole process.

okigan commented 1 year ago

First of all, thank you for looking into this!

I have not used pyinstaller before so I looked at the relevant docs. Some of the internal caveats make me concerned this may trip some users.

Also, if awscurl "was compiled to executable" I would like more context how that would be distributed/consumed. (feel free to respond here or grab some time at https://calendly.com/okigan/30min)

speller commented 1 year ago

@okigan My use cases:

1) Use awscurl docker image in a ci/cd environment when the job is not heavy and I need to perform some tasks with AWS. In this case, the size of the image makes sense - the smaller the size, the faster is job -> the faster the pipeline.

2) A complex job in a pipeline - in this case, I make a custom Docker image with the preinstalled set of tools I need instead of downloading each tool as a docker image or install in other ways. Here the size of tools and ease of installation makes sense. Related to awscurl, if I have an image with the binary, I will only add one line to my dockerfile:

COPY --from=okigan/awscurl /usr/local/bin/awscurl /usr/local/bin/awscurl

Otherwise, without the binary, I will have to install sources and Python to make it work, which will increase the resulting image size significantly. You may see in my latest example that I use multi-stage build to compile the binary and then copy only it to my image, dropping off Python, sources, and all dependencies. I add many tools to my image so, again, the size is important. I'm building my custom image on top of the Docker base image for my purposes (which is based on Alpine). If you will make an image with the binary only, you most probably will use the pure Alpine base image.

Does this clarify the context of the binary usage?

speller commented 1 year ago

You may also redistribute the tool as precompiled binaries for different platforms if you wish. I install some tools in my images by downloading binaries. This also helps to save size and time.

okigan commented 1 year ago

so I think your flow creates an "uber" docker image with all the necessary tools. And precompiled binaries are a way to avoid conflicts between the different tools.

in the pyinstaller step, the specific binaries are compiled for your version of the (alpine) OS. If this binaries are published I think we'd need to keep them updated per (worst case) OS version (which seems a lot of ongoing work)

if your and awscurl docker image is based on the same alpine base image the extra download should be rather small (i.e. docker does the diff for you)

Maybe the issue we could make the base image more reusable, i.e adjusting this line: https://github.com/okigan/awscurl/blob/master/Dockerfile#L1

speller commented 1 year ago

From my experience, the majority of linux binaries work well under alpine if they're compiled without external dependencies. In some cases, binaries compiled under any alpine could be required.

Maybe the issue we could make the base image more reusable

Yes, if it will contain a binary then it would solve my issue.

speller commented 1 year ago

Any updates?

okigan commented 1 year ago

So this is still not officially supported, I've created a repo to create standalone awscurl (mostly based on what you've figured out above), additional image size seem within expected [see snapshot]. And there is Makefile to build and run/test that standalone awscurl work.

image