nerc-images / jupyter-cpp-cling

A Jupyter Notebook for C++ Cling that runs on Red Hat OpenShift AI
GNU General Public License v3.0
0 stars 0 forks source link

Some suggestions on building container images #3

Closed larsks closed 8 months ago

larsks commented 9 months ago

@hpdempsey forwarded me a thread that included a pointer to this image, and I wanted to leave a couple of comments on the structure of the Containerfile:

  1. Lose the MAINTAINER field. The answer to "who maintains this?" should be git log; naming a single person as the maintainer discourages collaboration and can cause confusion if people fork the repository without updating that field.

    Fedora deprecated the packager field in RPM packaging some time ago for exactly these reasons.

    If you want to ensure the generated image contains a link back to the source repository, consider adding an appropiate LABEL of some sort that contains the repository URL.

  2. When you install packages, if you list them one-per-line this will lead to more meaningful diff output. Compare:

    diff --git a/Dockerfile b/Dockerfile
    index a3cca41..1a71054 100644
    --- a/Dockerfile
    +++ b/Dockerfile
    @@ -1,4 +1,4 @@
     FROM quay.io/opendatahub-contrib/workbench-images:jupyter-datascience-c9s-py311_2023c_latest
    
     USER root
    -RUN yum install -y root-cling gcc-c++ clang cmake conda xtensor-devel mlpack-bin mlpack-devel armadillo armadillo-devel gsl-devel hdf5-devel boost-devel
    +RUN yum install -y root-cling gcc-c++ clang conda xtensor-devel mlpack-bin mlpack-devel armadillo armadillo-devel gsl-devel hdf5-devel boost-devel

    With:

    diff --git a/Dockerfile b/Dockerfile
    index ba0d7e8..15e55fe 100644
    --- a/Dockerfile
    +++ b/Dockerfile
    @@ -5,7 +5,6 @@ RUN yum install -y \
            root-cling \
            gcc-c++ \
            clang \
    -       cmake \
            conda \
            xtensor-devel \
            mlpack-bin \

    (I've used yum as an example here, but this also applies to pip, conda, etc.)

  3. Install packages in batches rather than individually.

    In most cases, this:

    pip install \
      package1 \
      package2 \
      package3

    Is much faster than:

    pip install package1
    pip install package2
    pip install package3

    The first method allows dependencies to be calculated once, whereas the second method requires dependencies to be calculated multiple times.

  4. Automate image building.

    Rather than requiring people to build images manually, include a workflow in the repository that will build the image, tag it, and push it to an image registry. There's an example of such a workflow here; that uses the GitHub container registry (ghcr.io) because it's easy, but one could apply the same technique for other registries as well.

    This can also be helpful if folks are unable to build images locally for whatever reason.

computate commented 8 months ago

Thanks for the feedback here @larsks , I'm using these suggestions in the images now.