Closed wwood closed 9 months ago
Hello @wwood
It is unclear to me what you are trying to acomplish. It seems to me, that at the end of your first build stage, you had the image you wanted.
What do you gain from doing these two operations?
FROM scratch
COPY --from=0 / /
Thanks for the quick response. Doing that reduces the size of the image pretty dramatically for me (and I imagine for many/most others too) because it removes the layer history.
I'd love to see complete examples.
Here is what I just tried:
FROM mambaorg/micromamba:1.5.6
RUN micromamba install -y -n base -c conda-forge \
pyopenssl=20.0.1 \
python=3.9.1 \
requests=2.25.1 && \
micromamba clean --all --yes
resulting in
REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE
issue405 single_stage 3668a4d9e5f0 5 seconds ago linux/arm64 278.3 MiB 80.0 MiB
and then I tried
FROM mambaorg/micromamba:1.5.6
RUN micromamba install -y -n base -c conda-forge \
pyopenssl=20.0.1 \
python=3.9.1 \
requests=2.25.1 && \
micromamba clean --all --yes
FROM scratch
COPY --from=0 / /
resulting in
REPOSITORY TAG IMAGE ID CREATED PLATFORM SIZE BLOB SIZE
issue405 scratch e70df0fe75dc About a minute ago linux/arm64 278.1 MiB 79.9 MiB
As expected, copying to scratch
results in a very slightly smaller image. I imagine this is due to a reduction in the metadata for the layers. I don't find this size difference to be compelling.
Sure. The sizes don't quick match up to what I was saying above exactly because I simplified the dockerfile, but here's the dockerfile - sorry for the complexity remaining..
FROM mambaorg/micromamba:1.5.6
# Don't need all of the dependencies of singlem, because only pipe is going to be run.
COPY --chown=$MAMBA_USER:$MAMBA_USER env.yaml /tmp/env.yaml
RUN micromamba install -y -n base -f /tmp/env.yaml && \
micromamba clean --all --yes
# (otherwise python will not be found)
ARG MAMBA_DOCKERFILE_ACTIVATE=1
# NOTE: The following 2 hashes should be changed in sync.
ENV SINGLEM_COMMIT b27c15b0
ENV SINGLEM_VERSION 0.16.0-dev4
RUN rm -rf singlem && git init singlem && cd singlem && git remote add origin https://github.com/wwood/singlem && git fetch origin && git checkout $SINGLEM_COMMIT
RUN echo '__version__ = "'$SINGLEM_VERSION.${SINGLEM_COMMIT}'"' >singlem/singlem/version.py
# Remove bundled singlem packages
RUN rm -rfv singlem/singlem/data singlem/.git singlem/test singlem/appraise_plot.png
RUN pip install --no-dependencies kingfisher graftm
# Diamond - go via direct because conda-forge version is likely slower on
# account of not being compiled appropriately. Also, the conda version installs
# BLAST, which takes up space and we don't need.
RUN cd /tmp && curl -L 'https://github.com/bbuchfink/diamond/releases/download/v2.1.8/diamond-linux64.tar.gz' -O
RUN cd /tmp && \
tar xf diamond-linux64.tar.gz && \
cp diamond /opt/conda/bin/ && \
rm diamond-linux64.tar.gz diamond
# Effectively add singlem to the PATH
RUN ln -s /tmp/singlem/bin/singlem /opt/conda/bin/singlem
RUN micromamba remove git -y
RUN micromamba clean -afy
# Test it out
# COPY --chown=$MAMBA_USER:$MAMBA_USER SRR8653040.sra /tmp/
# RUN singlem pipe --sra-files /tmp/SRR8653040.sra --no-assign-taxonomy --metapackage /mpkg --archive-otu-table /tmp/a.json --threads 4
# RUN rm /tmp/SRR8653040.sra /tmp/a.json
# Remove all the build dependencies / image layers for a smaller image overall
# FROM scratch
# COPY --from=0 / /
To build you need env.yaml
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- python>=3.7
- biopython
- hmmer
- orfm
- extern
- sra-tools
- ncbi-ngs-sdk
- pip
- pandas # hopefully not needed for pipe --no-assign-taxonomy
- bird_tool_utils_python>=0.4.1
- zenodo_backpack
- sracat # usually installed via kingfisher, but we don't want all the kingfisher deps
- sqlalchemy
- git
- aria2 >=1.36.0 # For kingfisher aws-http
Uncommenting the last 2 lines of the dockerfile changes the size from 1.8 to 1.0GB
REPOSITORY TAG IMAGE ID CREATED SIZE
<none> <none> fe55dfea5340 33 seconds ago 1.01GB
<none> <none> 1f7fb38237a7 50 seconds ago 1.84GB
I can't get your image to build:
180.4 error libmamba response code: -1 error message: Invalid argument
180.4 critical libmamba failed to execute pre/post link script for sra-tools
This may be because I'm on a mac with an ARM processor, so I am using emulation, as the bioconda packages are amd64 only.
But I'm pretty sure much of what you are seeing is due to how you setup the layers. Adding files in one RUN ...
command and then deleting them in another RUN ...
command is going to bloat your image. Combine them into a single RUN ..
and the file that you added and then removed will not contribute to your layers. Your use of
FROM scratch
COPY --from=0 / /
is effectively cleaning up the inefficencies you generated by adding and then deleting files in separate layers.
Yes, imagine you are correct, that would probably work too, but only in limited circumstances.
For instance it isn't possible to COPY a file in, test that the program works, and then delete that file. It is also just annoying develop a dockerfile with tens of &&
entries, because iterating takes longer since layers can't be reused.
The last two lines feels like a more general solution to me (after adding the extras mentioned in my initial comment).
However, this is just my 2c - I only raised this issue as a suggestion, so feel free to ignore. Thanks for the great work with micromamba-docker.
@wwood I understand your frustration about the clunkiness of batching commands with &&
and being unable to delete files after they've been added.
For instance it isn't possible to COPY a file in, test that the program works, and then delete that file.
It's possible to add a test
stage to your Dockerfile so that the test stuff isn't in your main stage. Alternatively, you could run the tests in a separate step after building.
But ultimately the scripts in question that you're pointing out are only a few kilobytes, so there's no way they're causing your images to become large.
It's possible to add a test stage to your Dockerfile so that the test stuff isn't in your main stage. Alternatively, you could run the tests in a separate step after building.
Didn't know this - thanks for the tip.
Hi,
I wanted to create a small(ish) image from micromamba, which I managed to accomplish using a multi-stage build - in particular using
and got an image ~1.2GB in size.
At the end of my Dockerfile. Unfortunately this means the environment gets ruined somewhat. So I followed the instructions at https://micromamba-docker.readthedocs.io/en/latest/advanced_usage.html#adding-micromamba-to-an-existing-docker-image and that made it work again.
However, then the image was 2.2GB. This was due to these directives:
which for me was superfluous - I'd already done
COPY --from=0 / /
but they added layers, causing the image size increase.Maybe a comment can be made above these COPYs? Or even better, if we are interested in smaller images, then maybe an example Dockerfile could be given in the docs e.g.