vitabaks / postgresql_cluster

PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.
MIT License
1.29k stars 352 forks source link

Proposal for Dockerfile Optimization with Multi-Stage Builds #359

Closed ThomasSanson closed 10 months ago

ThomasSanson commented 11 months ago

Hello,

I have been examining the Dockerfile located in .config/gitpod/Dockerfile of the postgresql_cluster project and I would like to propose an enhancement using multi-stage builds.

Multi-stage builds are a feature in Docker that allow you to structure your Dockerfile into multiple stages. Each stage is like a mini-Dockerfile with its own FROM statement. The final stage can selectively copy artifacts from the previous stages, leaving behind everything not explicitly copied. This allows you to separate the build-time dependencies from the runtime dependencies, reducing the final image size and attack surface, making your containers more secure and efficient.

The primary benefits of multi-stage builds include:

  1. Reduced Image Size: Only the artifacts that are needed at runtime are copied to the final image. This reduces the size of the image by leaving out the build tools and intermediate files that are not needed in production.

  2. Improved Readability and Maintainability: By splitting the Dockerfile into multiple stages, it becomes easier to read and maintain. Each stage has a clear purpose, which simplifies understanding and debugging.

  3. Faster Build Times: Docker can cache the results of each stage separately. This means that if you change something in one stage, Docker only needs to rebuild that stage and the ones after it.

  4. Reduced Attack Surface: By minimizing the contents of the final image, you also minimize the potential attack surface for a malicious actor. This makes your container more secure.

Here's how the Dockerfile might look with multi-stage builds:

# First stage: Install build-time dependencies
FROM ubuntu:jammy as builder

USER root

# Copy Python version config file
COPY .config/python_version.config /tmp/

# Set a variable for the npm version
ARG NPM_VERSION=9.6.7
# Set a variable for the ungit version
ARG UNGIT_VERSION=1.5.23

# Update system and install packages*
# hadolint ignore=DL3008,DL3013
RUN PYTHON_VERSION=$(cut -d '=' -f 2 /tmp/python_version.config) \
    && apt-get update \
    && apt-get upgrade -y \
    && apt-get install -y --no-install-recommends \
        bash-completion \
        ca-certificates \
        curl \
        git \
        git-lfs \
        gnupg \
        htop \
        iproute2 \
        lsb-release \
        make \
        nano \
        python3-pip \
        "python${PYTHON_VERSION}" \
        "python${PYTHON_VERSION}-venv" \
        sudo \
        tree \
        vim \
        wget \
    && python3 -m pip install --no-cache-dir --upgrade pip \
    && python3 -m pip install --no-cache-dir virtualenv

# Install Docker
FROM builder as docker-installer

RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \
    && echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
    $(lsb_release -cs) stable" | tee /etc/apt/sources.list.d/docker.list > /dev/null \
    && apt-get update \
    && apt-get install -y --no-install-recommends docker-ce docker-ce-cli containerd.io

# Install npm and ungit
FROM docker-installer as npm-installer

RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - \
    && apt-get install  -y --no-install-recommends nodejs \
    && npm install -g npm@${NPM_VERSION} \
    && npm install -g \
        ungit@${UNGIT_VERSION}

# Cleanup
FROM npm-installer as cleanup

RUN apt-get clean && rm -rf /var/lib/apt/lists/* tmp/* 

# Final stage: Setup final image and user
FROM ubuntu:jammy

COPY --from=cleanup / /

# Create the gitpod user. UID must be 33333.
RUN useradd -l -u 33333 -G sudo -md /home/gitpod -s /bin/bash -p gitpod gitpod

USER gitpod

By applying this structure, we can make the Dockerfile more efficient and readable, and it reduces the size of the final image by leaving out unnecessary build-time dependencies.

Would you be open to considering this change ? I would be happy to assist further with this enhancement.

Best,

vitabaks commented 11 months ago

it may be worth using the slim version of the image

for example debian slim version.

ThomasSanson commented 11 months ago

@vitabaks,

I've looked into it, and it might be a bit challenging to use the debian:stable-slim version. Here's an example of the error:

E: Unable to locate package python3.10-venv
E: Couldn't find any package by glob 'python3.10-venv'
E: Couldn't find any package by regex 'python3.10-venv'

Furthermore, when comparing sizes: debian:stable-slim: 29.95 MB ubuntu:jammy: 28.17 MB

Personally, I would lean towards sticking with Ubuntu. What do you think ?

vitabaks commented 11 months ago

there is not much difference between ubuntu and debian for a development environment.

Just for information: For a production environment, I usually choose Debian.

ThomasSanson commented 11 months ago

I use Rocky Linux or Debian in production. Alternatively, I directly opt for the technical stack using Alpine.

You can find the Rocky Linux tags at: https://hub.docker.com/_/rockylinux/tags

Or, if you prefer Alpine, you can explore the PostgreSQL 15.3 Alpine image at: https://hub.docker.com/layers/library/postgres/15.3-alpine/images/sha256-58a4e7ae605e8e247180ebba1cc3758ab20677e9a5221ab3150a74f47938b8a1?context=explore

vitabaks commented 11 months ago

I've looked into it, and it might be a bit challenging to use the debian:stable-slim version. Here's an example of the error:

If the python3.10-venv package is not available from the main repositories, we can use other sources (e.q. deadsnakes PPA). Or install another version of python.

Try

add-apt-repository -y ppa:deadsnakes/ppa

ThomasSanson commented 11 months ago

@vitabaks Yes, indeed, I don't give up at the slightest obstacle, haha.

On a more serious note, I understand that we can overcome the challenge, but I don't see any significant benefits in terms of the effort required to switch from the ubuntu:jammy development environment to debian-stable-slim.

Unless I'm missing something?

vitabaks commented 11 months ago

yes, the data you provided on the size of the image file tells us that there is no need to switch to the slim version of debian. What about the slim version of ubuntu?

ThomasSanson commented 11 months ago

I couldn't find a specifically labeled "slim" version for Ubuntu on the official Docker Hub repository: https://hub.docker.com/_/ubuntu/tags

However, it appears that most of the Ubuntu images available are already slim by default.

vitabaks commented 11 months ago

Ok, thanks for checking.