Open cclauss opened 3 years ago
@cclauss, I'd like to work trying to reduce the size of the images using multi-stage builds. I also see it is related to #2656 and #4896, and I am just mentioning them together so they're all linked up, as they share a common goal of reducing image size.
Awesome. Please start by adding here a before list of our current Docker images and their sizes.
A 4th idea would be using Alpine images (https://hub.docker.com/layers/library/python/3.11.1-alpine for instance). Is it something you already tried?
Edit: seems a false idea.
I hit a wall on this project insofar as I was able to create the sketch of a multistage olbase
image, but recreating the production environment locally (for testing) has proved to be a substantial challenge for me, in part because I was speculating about the config files in /olsystem
used for HAProxy, etc.
I kept telling myself I would make it work, but I should be honest with myself and say that I don't think I am going to get it to work.
Having said that, I think the biggest gains here would be to make olbase
as small as possible, and only include the minimal amount needed in each Dockerfile
, and then copy data from earlier stages to reduce build time and reduce leftover build artifacts (such as dropping build-only depencies).
Here, the biggest single size reduction (~817 MB) would come from dropping node_modules
from olbase
, and only including it where needed (i.e. oldev
), if indeed it isn't needed in production.
I wanted to test this on the theory that once webpack compiles everything, perhaps node_modules
would no longer be needed, as for other projects I generally drop that directory. But that's where I hit the testing/recreating production issue.
For reference, I've attached a multistage Dockerfile.olbase
along with the size of each layer. It only cuts about 200 MB off the current build. But if the production image doesn't need the editors and the like built in, and those can also be cut from olbase
and only included where necessary, that could perhaps save another 400-500 MB (but I did not test that) on top of the potential ~817 MB from node_modules
if that can be dropped.
Layer sizes on a hypothetical multistage olbase
that keeps the text editors and node_modules
.
IMAGE CREATED CREATED BY SIZE COMMENT
5e42cf30cbbf 20 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c ln -s ve… 31.6MB
68b678579b0f 21 minutes ago /bin/sh -c #(nop) COPY --chown=openlibrary:o… 195MB
e8766ec31800 21 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c npm ci 817MB
4d14949b35a0 23 minutes ago /bin/sh -c #(nop) COPY --chown=openlibrary:o… 1.83MB
55cb59529b9f 23 minutes ago /bin/sh -c #(nop) USER openlibrary 0B
3d40aeef6889 23 minutes ago /bin/sh -c #(nop) WORKDIR /openlibrary 0B
ab8b4bdf71e2 23 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c mkdir -p… 0B
bba0000ea342 23 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c rm /usr/… 20.3MB
e9fb5277055a 23 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c apt-get … 9.87MB
5bd33c36587d 23 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c echo "de… 59B
ce2502930b56 23 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c wget -O … 1.19kB
9afb6f980fea 23 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c apt-get … 18.5MB
2f760aa2b3aa 23 minutes ago /bin/sh -c #(nop) USER root 0B
8d1af785fe57 23 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c pip inst… 104MB
8c8f917a7ddb 23 minutes ago /bin/sh -c #(nop) COPY dir:fcb2c08dfcb2af437… 25.1MB
e26a921d83a6 23 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c curl -sL… 135MB
5b5b51f9575d 24 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c apt-get … 543MB
c0710a1c67f7 24 minutes ago |1 APP_HOME=/openlibrary /bin/sh -c groupadd… 7.28kB
0c2574501b54 24 minutes ago /bin/sh -c #(nop) ENV LC_ALL=POSIX 0B
0e26c0831152 24 minutes ago /bin/sh -c #(nop) ENV LANG=en_US.UTF-8 0B
de8dba5a1862 24 minutes ago /bin/sh -c #(nop) ARG APP_HOME=/openlibrary 0B
281d606e8b48 46 hours ago /bin/sh -c #(nop) CMD ["python3"] 0B
<missing> 46 hours ago /bin/sh -c set -eux; savedAptMark="$(apt-m… 12.1MB
<missing> 46 hours ago /bin/sh -c #(nop) ENV PYTHON_GET_PIP_SHA256… 0B
<missing> 46 hours ago /bin/sh -c #(nop) ENV PYTHON_GET_PIP_URL=ht… 0B
<missing> 46 hours ago /bin/sh -c #(nop) ENV PYTHON_SETUPTOOLS_VER… 0B
<missing> 46 hours ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=22… 0B
<missing> 46 hours ago /bin/sh -c set -eux; for src in idle3 pydoc… 0B
<missing> 46 hours ago /bin/sh -c set -eux; savedAptMark="$(apt-m… 32MB
<missing> 46 hours ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.11.2 0B
<missing> 46 hours ago /bin/sh -c #(nop) ENV GPG_KEY=A035C8C19219B… 0B
<missing> 47 hours ago /bin/sh -c set -eux; apt-get update; apt-g… 3.1MB
<missing> 47 hours ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
<missing> 47 hours ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B
<missing> 2 days ago /bin/sh -c #(nop) CMD ["bash"] 0B
<missing> 2 days ago /bin/sh -c #(nop) ADD file:3ea7c69e4bfac2ebb… 80.5MB
The same information, but with --no-trunc
, to help make it a bit easier to match the layers to the steps in Dockerfile.olbase
.
CREATED CREATED BY SIZE COMMENT
|1 APP_HOME=/openlibrary /bin/sh -c ln -s vendor/infogami/infogami infogami && make && python -m pip list --outdated # TODO: What is this for? Just general information? 31.6MB
/bin/sh -c #(nop) COPY --chown=openlibrary:openlibrarydir:4aef3fc5af16b432f5e698fa10753a4ce7c18db4c8fddae384511f6204756f5e in /openlibrary 195MB
|1 APP_HOME=/openlibrary /bin/sh -c npm ci 817MB
/bin/sh -c #(nop) COPY --chown=openlibrary:openlibrarymulti:ea10c9c467755f5278364ae495d0a251e994e2eeb4625d107b8769b21fc712a5 in ./ 1.83MB
/bin/sh -c #(nop) USER openlibrary 0B
/bin/sh -c #(nop) WORKDIR /openlibrary 0B
|1 APP_HOME=/openlibrary /bin/sh -c mkdir -p /var/log/openlibrary /var/lib/openlibrary && chown openlibrary:openlibrary /var/log/openlibrary /var/lib/openlibrary && mkdir /openlibrary && chown openlibrary:openlibrary /openlibrary && mkdir -p /var/lib/coverstore && chown openlibrary:openlibrary /var/lib/coverstore && mkdir -p /solr-updater-data && chown openlibrary:openlibrary /solr-updater-data 0B
|1 APP_HOME=/openlibrary /bin/sh -c rm /usr/sbin/nginx && curl -L https://archive.org/download/nginx/nginx -o /usr/sbin/nginx && chmod +x /usr/sbin/nginx && rm /etc/nginx/sites-enabled/default 20.3MB
|1 APP_HOME=/openlibrary /bin/sh -c apt-get update && apt-get -y install --no-install-recommends openresty 9.87MB
|1 APP_HOME=/openlibrary /bin/sh -c echo "deb http://openresty.org/package/debian $(lsb_release -sc) openresty" | tee /etc/apt/sources.list.d/openresty.list 59B
|1 APP_HOME=/openlibrary /bin/sh -c wget -O - https://openresty.org/package/pubkey.gpg | apt-key add - 1.19kB
|1 APP_HOME=/openlibrary /bin/sh -c apt-get update && apt-get install -y --no-install-recommends nginx curl apt-transport-https lsb-release ca-certificates wget logrotate 18.5MB
/bin/sh -c #(nop) USER root 0B
|1 APP_HOME=/openlibrary /bin/sh -c pip install --root-user-action=ignore --no-cache-dir --no-index --find-links=/wheels/ /wheels/* && rm -rf /wheels/ 104MB
/bin/sh -c #(nop) COPY dir:fcb2c08dfcb2af437f0d6f34ffc456df958c1e5e16e0a36a57c64d34debef24f in /wheels/ 25.1MB
|1 APP_HOME=/openlibrary /bin/sh -c curl -sL https://deb.nodesource.com/setup_16.x | bash - && apt-get install -y --no-install-recommends nodejs 135MB
|1 APP_HOME=/openlibrary /bin/sh -c apt-get -qq update && apt-get install -y --no-install-recommends build-essential postgresql-client git libpq-dev libxml2-dev libxslt-dev libffi-dev curl screen vim-nox emacs-nox parallel zip unzip lftp && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false && rm -rf /var/lib/apt/lists/* 543MB
|1 APP_HOME=/openlibrary /bin/sh -c groupadd --system --gid 999 openlibrary && useradd --no-log-init --system -u 999 --gid openlibrary --create-home openlibrary 7.28kB
/bin/sh -c #(nop) ENV LC_ALL=POSIX 0B
/bin/sh -c #(nop) ENV LANG=en_US.UTF-8 0B
/bin/sh -c #(nop) ARG APP_HOME=/openlibrary 0B
/bin/sh -c #(nop) CMD ["python3"] 0B
/bin/sh -c set -eux; savedAptMark="$(apt-mark showmanual)"; apt-get update; apt-get install -y --no-install-recommends wget; wget -O get-pip.py "$PYTHON_GET_PIP_URL"; echo "$PYTHON_GET_PIP_SHA256 *get-pip.py" | sha256sum -c -; apt-mark auto '.*' > /dev/null; [ -z "$savedAptMark" ] || apt-mark manual $savedAptMark > /dev/null; apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; rm -rf /var/lib/apt/lists/*; export PYTHONDONTWRITEBYTECODE=1; python get-pip.py --disable-pip-version-check --no-cache-dir --no-compile "pip==$PYTHON_PIP_VERSION" "setuptools==$PYTHON_SETUPTOOLS_VERSION" ; rm -f get-pip.py; pip --version 12.1MB
/bin/sh -c #(nop) ENV PYTHON_GET_PIP_SHA256=d1d09b0f9e745610657a528689ba3ea44a73bd19c60f4c954271b790c71c2653 0B
/bin/sh -c #(nop) ENV PYTHON_GET_PIP_URL=https://github.com/pypa/get-pip/raw/1a96dc5acd0303c4700e02655aefd3bc68c78958/public/get-pip.py 0B
/bin/sh -c #(nop) ENV PYTHON_SETUPTOOLS_VERSION=65.5.1 0B
/bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=22.3.1 0B
/bin/sh -c set -eux; for src in idle3 pydoc3 python3 python3-config; do dst="$(echo "$src" | tr -d 3)"; [ -s "/usr/local/bin/$src" ]; [ ! -e "/usr/local/bin/$dst" ]; ln -svT "$src" "/usr/local/bin/$dst"; done 0B
/bin/sh -c set -eux; savedAptMark="$(apt-mark showmanual)"; apt-get update; apt-get install -y --no-install-recommends dpkg-dev gcc gnupg dirmngr libbluetooth-dev libbz2-dev libc6-dev libexpat1-dev libffi-dev libgdbm-dev liblzma-dev libncursesw5-dev libreadline-dev libsqlite3-dev libssl-dev make tk-dev uuid-dev wget xz-utils zlib1g-dev ; wget -O python.tar.xz "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz"; wget -O python.tar.xz.asc "https://www.python.org/ftp/python/${PYTHON_VERSION%%[a-z]*}/Python-$PYTHON_VERSION.tar.xz.asc"; GNUPGHOME="$(mktemp -d)"; export GNUPGHOME; gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys "$GPG_KEY"; gpg --batch --verify python.tar.xz.asc python.tar.xz; command -v gpgconf > /dev/null && gpgconf --kill all || :; rm -rf "$GNUPGHOME" python.tar.xz.asc; mkdir -p /usr/src/python; tar --extract --directory /usr/src/python --strip-components=1 --file python.tar.xz; rm python.tar.xz; cd /usr/src/python; gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)"; ./configure --build="$gnuArch" --enable-loadable-sqlite-extensions --enable-optimizations --enable-option-checking=fatal --enable-shared --with-lto --with-system-expat --without-ensurepip ; nproc="$(nproc)"; LDFLAGS="-Wl,--strip-all"; make -j "$nproc" "EXTRA_CFLAGS=${EXTRA_CFLAGS:-}" "LDFLAGS=${LDFLAGS:-}" "PROFILE_TASK=${PROFILE_TASK:-}" ; rm python; make -j "$nproc" "EXTRA_CFLAGS=${EXTRA_CFLAGS:-}" "LDFLAGS=${LDFLAGS:--Wl},-rpath='\$\$ORIGIN/../lib'" "PROFILE_TASK=${PROFILE_TASK:-}" python ; make install; cd /; rm -rf /usr/src/python; find /usr/local -depth \( \( -type d -a \( -name test -o -name tests -o -name idle_test \) \) -o \( -type f -a \( -name '*.pyc' -o -name '*.pyo' -o -name 'libpython*.a' \) \) \) -exec rm -rf '{}' + ; ldconfig; apt-mark auto '.*' > /dev/null; apt-mark manual $savedAptMark; find /usr/local -type f -executable -not \( -name '*tkinter*' \) -exec ldd '{}' ';' | awk '/=>/ { print $(NF-1) }' | sort -u | xargs -r dpkg-query --search | cut -d: -f1 | sort -u | xargs -r apt-mark manual ; apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false; rm -rf /var/lib/apt/lists/*; python3 --version 32MB
/bin/sh -c #(nop) ENV PYTHON_VERSION=3.11.2 0B
/bin/sh -c #(nop) ENV GPG_KEY=A035C8C19219BA821ECEA86B64E628F8D684696D 0B
/bin/sh -c set -eux; apt-get update; apt-get install -y --no-install-recommends ca-certificates netbase tzdata ; rm -rf /var/lib/apt/lists/* 3.1MB
/bin/sh -c #(nop) ENV LANG=C.UTF-8 0B
/bin/sh -c #(nop) ENV PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 0B
/bin/sh -c #(nop) CMD ["bash"] 0B
/bin/sh -c #(nop) ADD file:3ea7c69e4bfac2ebb6f86baaedab31827c86a594dba8080a49928e211ad3c7a0 in / 80.5MB
A hypothetical multi-stage Dockerfile that drops some build dependencies, but still keeps all the editors and node_modules
.
Dockerfile.olbase.txt
ARG PYTHON_VERSION=3.11.2-slim-bullseye
# FROM node:16-bullseye-slim as client-builder
FROM python:${PYTHON_VERSION} as python
# Build stage
FROM python as build-stage
# Build dependencies
RUN apt-get -qq update && apt-get install -y --no-install-recommends \
build-essential \
libpq-dev \
libxml2-dev \
libxslt-dev \
libffi-dev
# Install and cache python packages here to use elsewhere.
COPY ./requirements.txt .
RUN pip wheel --wheel-dir /usr/src/openlibrary/wheels --default-timeout=100 \
-r requirements.txt
# Run stage
FROM python as run-stage
ARG APP_HOME=/openlibrary
# WORKDIR ${APP_HOME}
ENV LANG en_US.UTF-8
# required for postgres
ENV LC_ALL POSIX
# Create openlibrary users
# We use 999:999 for the openlibrary user. Any volume mounts which require read/write
# access by the container should be set to this user. Ideally we would use a number
# larger than 10,000 to avoid host OS uid/gid conflicts, but this is what we have
# at the moment.
RUN groupadd --system --gid 999 openlibrary \
&& useradd --no-log-init --system -u 999 --gid openlibrary --create-home openlibrary
# Install required system dependencies.
RUN apt-get -qq update && apt-get install -y --no-install-recommends \
build-essential \
postgresql-client \
git \
libpq-dev \
libxml2-dev \
libxslt-dev \
libffi-dev \
curl \
screen \
# Editors (for our convenience)
# What about -nox for both vim and emacs?
vim-nox \
emacs-nox \
# util for running things in parallel
parallel \
# automatic import pipeline dependencies
zip \
unzip \
lftp \
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
&& rm -rf /var/lib/apt/lists/*
# Install LTS version of node.js
RUN curl -sL https://deb.nodesource.com/setup_16.x | bash - \
&& apt-get install -y --no-install-recommends nodejs
# Copy and install dependencies into the run stage from the build stage.
# ENV PIP_ROOT_USER_ACTION=ignore
COPY --from=build-stage /usr/src/openlibrary/wheels /wheels/
RUN pip install --root-user-action=ignore --no-cache-dir --no-index \
--find-links=/wheels/ /wheels/* \
&& rm -rf /wheels/
# Install Archive.org nginx w/ IP anonymization
USER root
# TODO: curl is already installed
RUN apt-get update && apt-get install -y --no-install-recommends nginx curl \
# nginx-plus
# TODO: why is lsb-release in here? That has a python3 dependency.
apt-transport-https lsb-release ca-certificates wget \
# log rotation service for ol-nginx
logrotate
RUN wget -O - https://openresty.org/package/pubkey.gpg | apt-key add -
RUN echo "deb http://openresty.org/package/debian $(lsb_release -sc) openresty" \
| tee /etc/apt/sources.list.d/openresty.list
RUN apt-get update && apt-get -y install --no-install-recommends openresty
RUN rm /usr/sbin/nginx \
&& curl -L https://archive.org/download/nginx/nginx -o /usr/sbin/nginx \
&& chmod +x /usr/sbin/nginx \
# Remove the stock nginx config file
&& rm /etc/nginx/sites-enabled/default
RUN mkdir -p /var/log/openlibrary /var/lib/openlibrary && chown openlibrary:openlibrary /var/log/openlibrary /var/lib/openlibrary \
&& mkdir /openlibrary && chown openlibrary:openlibrary /openlibrary \
&& mkdir -p /var/lib/coverstore && chown openlibrary:openlibrary /var/lib/coverstore \
# In order to write to solr-updater's named volume, this needs to be
# pre-created with the right permissions
&& mkdir -p /solr-updater-data && chown openlibrary:openlibrary /solr-updater-data
WORKDIR ${APP_HOME}
# Link the ia CLI binary into /usr/local/bin so that it shows up
# on the PATH. Do this instead of trying to modify the PATH, because
# that causes headaches with su, cron, etc.
# USER root
# RUN ln -s /home/openlibrary/.local/bin/ia /usr/local/bin/ia
USER openlibrary
COPY --chown=openlibrary:openlibrary package*.json ./
RUN npm ci
COPY --chown=openlibrary:openlibrary . ${APP_HOME}
# run make to initialize git submodules, build css and js files
RUN ln -s vendor/infogami/infogami infogami \
&& make \
&& python -m pip list --outdated # TODO: What is this for? Just general information?
I am frustrated when doing a deployment that our Docker images are 2.5GB and they take a long time to copy to multiple servers. We currently build production images on
ol-home0
and then copy them tool-covers1',
ol-web1, and
ol-web2`. The size of our images slows this process.Describe the problem that you'd like solved
Reduce the size of our Docker images and transfer them more efficiently to accelerate our deployments.
Proposal & Constraints
apt-get install
commands into a single commandAdditional context
Stakeholders