jboss-dockerfiles / dogen

Simple Dockerfile generator
MIT License
18 stars 14 forks source link

Generated Dockerfiles Do Not Follow Best Practices #198

Open cpitman opened 7 years ago

cpitman commented 7 years ago

Dockerfiles generated are suboptimal, not following standard best practices:

  1. Lots of extra layers created with no effect, for example many USER layers
  2. Commands that can be concatenated into fewer layers (like RUN) are not
  3. Files are removed in later layers, literally making the images larger because of the COW filesystem

An example is below. A handcrafted version of this Dockerfile would be both much cleaner, and result in a smaller image.

# Copyright 2017 Red Hat
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ------------------------------------------------------------------------
#
# This is a Dockerfile for the redhat-openjdk-18/openjdk18-openshift:1.1 image.

FROM sha256:10b16692948966fb5902ba92186c79d78818463f1d90b542e70a82a3e1339667

# Environment variables
ENV JBOSS_IMAGE_NAME="redhat-openjdk-18/openjdk18-openshift" \
    JBOSS_IMAGE_VERSION="1.1" \
    MAVEN_VERSION="3.3.9-2.8.el7" \
    JOLOKIA_VERSION="1.3.6" \
    PATH="$PATH:"/usr/local/s2i"" \
    AB_JOLOKIA_PASSWORD_RANDOM="true" \
    AB_JOLOKIA_AUTH_OPENSHIFT="true" \
    JAVA_DATA_DIR="/deployments/data"

# Labels
LABEL name="$JBOSS_IMAGE_NAME" \
      version="$JBOSS_IMAGE_VERSION" \
      release="11" \
      architecture="x86_64" \
      com.redhat.component="redhat-openjdk-18-openjdk18-openshift-docker" \
      io.openshift.s2i.scripts-url="image:///usr/local/s2i" \
      io.fabric8.s2i.version.maven="3.3.9-2.8" \
      io.fabric8.s2i.version.jolokia="1.3.6" \
      io.k8s.description="Platform for building and running plain Java applications (fat-jar and flat classpath)" \
      io.k8s.display-name="Java Applications" \
      io.openshift.tags="builder,java" \
      io.openshift.s2i.destination="/tmp" \
      org.jboss.deployments-dir="/deployments"

# Exposed ports
EXPOSE 8080 8443 8778

USER root

# Add custom repo files
COPY repos/*.repo /etc/yum.repos.d/

# Install required RPMs and ensure that the packages were installed
RUN yum install -y --disablerepo=\* --enablerepo=jboss-rhel-optional --enablerepo=jboss-rhel-os --enablerepo=jboss-rhel-ose --enablerepo=jboss-rhel-rhscl rh-maven33 \
    && yum clean all && \
    rpm -q  rh-maven33

# Remove custom repo files
RUN rm /etc/yum.repos.d/jboss-rhel-optional.repo /etc/yum.repos.d/jboss-rhel-os.repo /etc/yum.repos.d/jboss-rhel-ose.repo /etc/yum.repos.d/jboss-rhel-rhscl.repo

# Add all artifacts to the /tmp/artifacts
# directory
COPY \
    hawkular-javaagent-1.0.0.CR5-redhat-1-shaded.jar \
    jolokia-jvm-1.3.6.redhat-1-agent.jar \
    /tmp/artifacts/

# Add scripts used to configure the image
COPY scripts /tmp/scripts

# Custom scripts
USER root
RUN [ "bash", "-x", "/tmp/scripts/s2i-common/install.sh" ]

USER root
RUN [ "bash", "-x", "/tmp/scripts/os-java-misc/install_as_root" ]

USER root
RUN [ "bash", "-x", "/tmp/scripts/os-java-s2i/install_as_root" ]

USER root
RUN [ "bash", "-x", "/tmp/scripts/os-java-jolokia/install_as_root" ]

USER root
RUN [ "bash", "-x", "/tmp/scripts/os-java-hawkular/install_as_root" ]

USER root
RUN [ "bash", "-x", "/tmp/scripts/os-java-run/install_as_root" ]

USER root
RUN rm -rf /tmp/scripts

USER root
RUN rm -rf /tmp/artifacts

USER 185
CMD ["/usr/local/s2i/run"]

LABEL "description"="Platform for building and running plain Java applications (fat-jar and flat classpath)" "url"="https://access.redhat.com/containers/#/registry.access.redhat.com/redhat-openjdk-18/openjdk18-openshift/images/1.1-11" "build-date"="2017-08-03T07:29:33.405292" "com.redhat.build-host"="ip-10-29-120-158.ec2.internal" "vcs-ref"="46f072e71f728aa55433d107f2d7888cd1fac702"
goldmann commented 7 years ago

Let me go through the critic one by one.

Lots of extra layers created with no effect, for example many USER layers

That's correct. Since we are abstracting script execution we need to be able to define the user that should be used to run these scripts. It's true that this could be optimized (some of the USER commands are not necessary because we've already witched to that particular user).

Commands that can be concatenated into fewer layers (like RUN) are not

That's also true, but there is always a trade off between caching and being able to built image quicker. If you concatenate executions then change in any script will invalidate every other scripts. If these are long-running tasks you will at the end open a ticket about exactly opposite what you are suggesting now.

Files are removed in later layers, literally making the images larger because of the COW filesystem

That's true again. But we don't care about this and let me explain why we do things like we do so you get a better picture.

This tool was created to overcome Docker limitations in the first place. Dockerfile syntax is just broken, sometimes doesn't make sense, it's hard to maintain does not allow to share scripts/files between images and so on. I could create a fairly long list why it's not optimal.

At the same time Dockerfile is (still) a kind of standard when we talk about building container images. This will hopefully change soon and I already see promising projects (https://github.com/projectatomic/buildah for example).

OK, now we know that Dockerfile is broken and it's a standard. Our approach is to:

  1. Use (abstract) image descriptors to be able to generate anything from it and not be tied to some specific builder/project/product.
  2. Do not care much about Dockerfiles since these is just an intermediate artifact that will be fed into the build system. We care about the image descriptor and about the image only.

Since Dockerfiles are broken by design there are many issues we need to fight and we still see it in our generated Dockerfiles. Most important issues are:

  1. Too many unnecessary layers
  2. Added files removed in different layer are not removed from the image, but just hidden.

I agree with everything you pointed out but these are not the generator tool issues but Dockerfile syntax issues, which luckily can be solved. For this purpose we do post processing on every single container image we release using this tool: https://github.com/goldmann/docker-squash I won't repeat README of that project here, but it's solving exactly these issues I pointed out above.

So, to recap - at development time - we're perfectly fine with a bit "too big" images, but when we release them we make them pretty.

Hope you understand the background better now.

goldmann commented 7 years ago

BTW, Dogen is now in maintenance mode. Next generation of this tool is now available here: https://github.com/jboss-container-images/concreate. It's still early days (first RC was released), but it'll replace this tool soon.