Open speller opened 2 years ago
@speller please add more info what/which dependency is causing issues.
@speller please add more info what/which dependency is causing issues.
I'm not proficient in Python, so I can't say what's missing. I don't know anything about compiling Python programs to binaries. But having a binary pushed to GitHub releases would be super useful. And it also would be super nice if it will work under the Alpine linux.
The docket images are actively used and are built based on alpine. See this: https://github.com/okigan/awscurl/blob/master/Dockerfile
On Sep 4, 2021, at 11:43 PM, Alexander Pravdin @.***> wrote:
@speller please add more info what/which dependency is causing issues.
I'm not proficient in Python, so I can't say what's missing. I don't know anything about compiling Python programs to binaries. But having a binary pushed to GitHub releases would be super useful. And it also would be super nice if it will work under the Alpine linux.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
What's the binary path then? The entrypoint says that it is run as a python script, not as a binary as I understood: ENTRYPOINT ["python", "-m", "awscurl.awscurl"]
Could you let me know what files should I copy from the awscurl docker image to make it working locally on another Alpine-based image?
I've managed to compile by myself. Here is my Dockerfile code that builds awscli
and awscurl
. The awscurl
part shares almost everything from the awscli
setup process. I had no time to strip the awscli
part to leave awscurl
only. The difficulty is to get a pyinstaller
Alpine bootstrap binary which doesn't exists by default, that's why all these workarounds were made (aws-cli
v2 doesn't have an official Alpine image).
# AWS CLI installation based on https://github.com/aws/aws-cli/issues/4685#issuecomment-829600284
ARG PYTHON_VERSION
ARG ALPINE_VERSION
ARG DOCKER_VERSION
FROM python:${PYTHON_VERSION}-alpine${ALPINE_VERSION} AS installer
RUN apk add --no-cache \
curl \
unzip \
gcc \
git \
libc-dev \
libffi-dev \
openssl-dev \
py3-pip \
zlib-dev \
make \
cmake
ARG AWSCLI_VERSION
RUN git clone --recursive --depth 1 --branch ${AWSCLI_VERSION} --single-branch https://github.com/aws/aws-cli.git \
&& cd /aws-cli \
# Follow https://github.com/six8/pyinstaller-alpine to install pyinstaller on alpine
&& pip install --no-cache-dir --upgrade pip \
&& pip install --no-cache-dir pycrypto \
&& git clone --depth 1 --single-branch --branch v$(grep PyInstaller requirements-build.txt | cut -d'=' -f3) https://github.com/pyinstaller/pyinstaller.git /tmp/pyinstaller \
&& cd /tmp/pyinstaller/bootloader \
&& CFLAGS="-Wno-stringop-overflow -Wno-stringop-truncation" python ./waf configure --no-lsb all \
&& pip install .. \
&& rm -Rf /tmp/pyinstaller \
&& cd - \
&& boto_ver=$(grep botocore setup.cfg | cut -d'=' -f3) \
&& git clone --single-branch --branch v2 https://github.com/boto/botocore /tmp/botocore \
&& cd /tmp/botocore \
&& git checkout $(git log --grep $boto_ver --pretty=format:"%h") \
&& pip install . \
&& rm -Rf /tmp/botocore \
&& cd - \
&& sed -i '/botocore/d' requirements.txt \
&& scripts/installers/make-exe \
&& unzip dist/awscli-exe.zip \
&& ./aws/install --bin-dir /aws-cli-bin
COPY awscurl/cli.py /awscurl-cli.py
ARG AWSCURL_VERSION
RUN cd / \
&& git clone --recursive --depth 1 --branch v${AWSCURL_VERSION} --single-branch https://github.com/okigan/awscurl \
&& cd /awscurl \
&& pip install configargparse \
&& pip install requests \
&& cp /awscurl-cli.py cli.py \
&& pyinstaller cli.py --onefile --hidden-import=configargparser --hidden-import=requests --name awscurl
...
FROM docker:${DOCKER_VERSION}
...
COPY --from=installer /usr/local/aws-cli/ /usr/local/aws-cli/
COPY --from=installer /aws-cli-bin/ /usr/local/bin/
COPY --from=installer /awscurl/dist/awscurl /usr/local/bin/awscurl
Versions:
DOCKER_VERSION=20.10.8
AWSCLI_VERSION=2.2.32
AWSCURL_VERSION=0.24
PYTHON_VERSION=3.9.7
ALPINE_VERSION=3.14
The new entrypoint file cli.py
is pretty standard:
from awscurl.__main__ import main
if __name__ == "__main__":
main()
I guess you will need the requirements-build.txt
file from awscli
just for setup purposes if making awscurl
-only Dockerfile.
The following Dockerfile code is used to compile binary under Python 3.9 Alpine 3.16
ARG PYTHON_VERSION
ARG DOCKER_VERSION
FROM python:${PYTHON_VERSION} AS installer
RUN set -ex; \
apk add --no-cache \
git \
unzip \
groff \
curl \
build-base \
libffi-dev \
cmake
COPY awscurl/cli.py /awscurl-cli.py
ARG AWSCURL_VERSION
RUN set -eux \
&& cd / \
&& git clone --recursive --depth 1 --branch v${AWSCURL_VERSION} --single-branch https://github.com/okigan/awscurl \
&& cd /awscurl \
&& pip install configargparse \
&& pip install requests \
&& pip install pyinstaller==4.10 \
&& cp /awscurl-cli.py cli.py \
&& pyinstaller cli.py --onefile --hidden-import=configargparser --hidden-import=requests --name awscurl
FROM docker:${DOCKER_VERSION}
COPY --from=installer /awscurl/dist/awscurl /usr/local/bin/awscurl
Versions:
DOCKER_VERSION=20.10.8
AWSCURL_VERSION=0.26
PYTHON_VERSION=3.9-alpine3.16
This allows adding only binary to my image without pulling Python and raw sources. @okigan would you consider adding binaries only to the docker build instead of sources? It doesn't make sense to pull Python when only awscurl is required. And it also will simplify adding awscurl to custom docker images. Saving images' size as much as possible makes sense in deployment pipelines where many images are downloaded often, and bigger images slow down the whole process.
First of all, thank you for looking into this!
I have not used pyinstaller before so I looked at the relevant docs. Some of the internal caveats make me concerned this may trip some users.
Also, if awscurl "was compiled to executable" I would like more context how that would be distributed/consumed. (feel free to respond here or grab some time at https://calendly.com/okigan/30min)
@okigan My use cases:
1) Use awscurl docker image in a ci/cd environment when the job is not heavy and I need to perform some tasks with AWS. In this case, the size of the image makes sense - the smaller the size, the faster is job -> the faster the pipeline.
2) A complex job in a pipeline - in this case, I make a custom Docker image with the preinstalled set of tools I need instead of downloading each tool as a docker image or install in other ways. Here the size of tools and ease of installation makes sense. Related to awscurl, if I have an image with the binary, I will only add one line to my dockerfile:
COPY --from=okigan/awscurl /usr/local/bin/awscurl /usr/local/bin/awscurl
Otherwise, without the binary, I will have to install sources and Python to make it work, which will increase the resulting image size significantly. You may see in my latest example that I use multi-stage build to compile the binary and then copy only it to my image, dropping off Python, sources, and all dependencies. I add many tools to my image so, again, the size is important. I'm building my custom image on top of the Docker base image for my purposes (which is based on Alpine). If you will make an image with the binary only, you most probably will use the pure Alpine base image.
Does this clarify the context of the binary usage?
You may also redistribute the tool as precompiled binaries for different platforms if you wish. I install some tools in my images by downloading binaries. This also helps to save size and time.
so I think your flow creates an "uber" docker image with all the necessary tools. And precompiled binaries are a way to avoid conflicts between the different tools.
in the pyinstaller step, the specific binaries are compiled for your version of the (alpine) OS. If this binaries are published I think we'd need to keep them updated per (worst case) OS version (which seems a lot of ongoing work)
if your and awscurl docker image is based on the same alpine base image the extra download should be rather small (i.e. docker does the diff for you)
Maybe the issue we could make the base image more reusable, i.e adjusting this line: https://github.com/okigan/awscurl/blob/master/Dockerfile#L1
From my experience, the majority of linux binaries work well under alpine if they're compiled without external dependencies. In some cases, binaries compiled under any alpine could be required.
Maybe the issue we could make the base image more reusable
Yes, if it will contain a binary then it would solve my issue.
Any updates?
So this is still not officially supported, I've created a repo to create standalone awscurl (mostly based on what you've figured out above), additional image size seem within expected [see snapshot]. And there is Makefile to build and run/test that standalone awscurl work.
It would be nice to have a possibility to compile awscurl to a binary for optimal disk space usage especially in Docker containers. It's hard to pull all the dependencies required for the program to work. Particularly, I'm building an image for CI/CD that will have awscurl installed.