Make (automated) builds more reliable

porst17 commented 8 years ago

I see several different approaches to tackle this problem. We may also use a combination of them.

Tag docker images with versions. Once an image on dockerhub is tagged as something different then latest it is never changed. Each FROM directive has to refer to a tagged version that is not latest. Currently, all our Dockerfiles expect for :base just use the :latest image, which is subject to change at any time. This will provide reproducible builds for the top-most Dockerfile across different hosts as long as the same package versions are in the package repositories.
Use version pinning when installing packages via Dockerfile. This will ensure reproducible builds more or less. It may cause the builds to break. But at least we are able to track down issues that are caused/solved by unpredictably updated packages between builds. Once a build breaks due to unavailable packages in the package repo, the Dockerfile has to be updated and tested (which is IMO better then blindly excepting all incoming updates). If combined with 1, only a change in the git repository will result in a different result after build (more or less, version pinning is not 100% reliable since not all dependencies are pinned as well).
Use a purely functional package manager like NiX to install packages. Not sure if this is feasible insider Docker images, but this solves many side-effects in traditional package management.

1 and 2 should already solve most of our problems with reproducibility.

porst17 commented 8 years ago

My statement about the :latest tag above is a bit confusing. We are abusing tags and :latest is reserved by Docker, just like HEAD by git. :latest in my above comment just refers to the latest version of the image for a certain abused tag.

malex984 commented 8 years ago

IMO going back in git history and building /kiosk (with its parents) should give you back the :kiosk that you had before (more or less).

So... do you want to stop abusing tags or to abuse them even more with some ID added?
well, we rely on Ubuntu LTS, which means to me that nothing terribly bad can happen due to update followed by install & configuration of a new package... (did not happen so far) Do you want to track down ALL the dependencies of all packages and explicitly specify all of them with their versions each time we install any package?
so should we now switch to something different to docker (like NiX) altogether?

It would be a shame if we will not be able to make use of automatic builds on docker hub due to additional versions... or reconfigure automatic building by hand on each version change... :(

Moreover IMHO versions for docker images should not be artificially introduced by hand - they can be automatically generated (e.g. due to git repository)!

porst17 commented 8 years ago

Going back in git history only brings you back to the previous state of your repository, i.e. to a previous state of your shell scripts and Dockerfiles. But it does not bring you back to the previous list of packages you had in a previous build. Package repositories are updated independently of your git repository. Package versions change all the time. Even if both of us build a docker image at about the same time, there can always be different packages in the repositories. This is a race condition.

Even if we rely on Ubuntu 14.04 LTS as a baseimage, we have additional external package managers that use bleeding edge software. npm is the most prominent example for installing electron. Another example is Python if we use pip.

Anyway, your argument is just invalid. As an example, consider my two build of :kiosk yesterday and the day before yesterday: The contents of the git repository did not change at all. But both images are completely different. The old one shows the 4k bug #16, the new one doesn't. I cannot go back because I just don't know what the old state was. It is not reproducible. Same thing may happen with two builds that start arbitrarily close in time (race condition again).

Even your suggestions with make pull is invalid. Because our images are unversioned, I can never know when or if you updated an image on Dockerhub. Be it by manual uploading or by some automated build. With automated builds it gets even worse. The `:kiosk' image is based on a chain of six unversioned images and one versioned image (phusion/baseimage:0.9.16). This gets messy pretty easily.

To address your specific points:

As a short term solution, we can add a version id to the tag, i.e. :base:0.1.0 (I suggest using semantic versioning, as always).
It is not about something going terribly wrong, it is about reproducibility and unexpected behavior. And we are using some bleeding edge packages. In essence, every package with a version like 0.x.x can completely change it's behavior from one release to the next. ("Major version zero (0.y.z) is for initial development. Anything may change at any time. The public API should not be considered stable." http://semver.org/; it's common practice). Even if libraries can be considered more or less stable in Ubuntu 14.04, end-user application may break at any time. E.g. Chromium, Google Chrome, Firefox and Opera change their major version number every couple of weeks. "Major version X (X.y.z | X > 0) MUST be incremented if any backwards incompatible changes are introduced to the public API" (semver.org). And these packages are even in Ubuntu 14.04 LTS (example: Firefox 42 has been released a couple of days ago). To summarize: We should pin the versions for every package we explicitly install via apt-get. I am fine with partial pinning like apt-get install xserver-xorg=1:7.7+* instead of apt-get install xserver-xorg=1:7.7+1ubuntu8.1, so we still get some (distro) patches and bug fixes in while keeping feature set stable.
My idea was just to use NiX inside Docker, but maybe switching is an option, but I think, some kind of container technology is still necessary in order to isolated the apps at runtime. Nix and friends just allows to install things side by side. But e.g. you can not run two servers at port 80 at the same time if you have two web server apps on the same machine. Inside two different containers, this is trivial if the ports are not exported. I would postpone this decision.

Why should the versioning approach break automated builds on Dockerhub? Introducing versions for images and using them in the FROM statement just ensures a clear dependency structure. Otherwise, all builds use :latest(in Docker terms) which is just a mess if you have images that rely on a large chain of other :latest images. You never really know when an image in that chain will change.

I can't really tell how to solve the automated build problems right know. We have everything in the same git repo and images are dependent on one another. Currently, I do not fully understand the automatic build logic with respect to these two things.

malex984 commented 6 years ago

(1.) can be already done (2.) is generally difficult (it is not feasible for us to specify versions of ALL packages installed in ALL our images), but we can (and usually we do) specify versions of top-most SW packages (e.g. Kiosk-Browser, CUDA library, NodeJS, OMD, checkmk_agent etc.) (3.) to be postponed for some future project.

porst17 commented 6 years ago

It can, but is it used?
explicit versions should be set for all packages we explicitly install (e.g. via apt-get)

hilbert / hilbert-docker-images

Make (automated) builds more reliable #17