mirage / qubes-mirage-firewall

A Mirage firewall VM for QubesOS
BSD 2-Clause "Simplified" License
210 stars 28 forks source link

Reproducible build systems: use in GitHub action the build-with-docker.sh #164

Closed hannesm closed 1 year ago

hannesm commented 1 year ago

Also upload the artifact to GitHub action, and in addition use the same setup (ubuntu 20.04 image) and build directories as done on builds.robur.coop.

Also use strip on the resulting binary to reduce it's size (since the debug section aren't mapped into the running unikernel, there's nothing we get from them -- also they are preserved (as .debug file) and uploaded to https://builds.robur.coop if one needs them).

This entails binary reproducibility between the different systems:

The situation before:

I tried to get them reproducible, but even using a "ubuntu 22.04" image from ocaml/opam and GitHub action did not lead to the same output (it is the same C compiler, but they report versions differently for some reason).

So instead I stick to the official ubuntu 20.04 image (we can of course upgrade at any time -- but we should then as well have a worker on builds.robur.coop doing this build), which is very small in size.

The drawback is that compilation (execution of ./build-with-docker.sh) on a high-speed Internet connection will take some more time (since the OCaml compiler is bootstrapped). Another drawback is that in the Dockerfile the opam binary is manually downloaded from github.com -- reason is that we need a recent one (>= 2.1.0 for MirageOS 4; and ubuntu 20.04 only ships 2.0.5 -- solved in ubuntu 22.04 which ships 2.1.2).

The advantage is that we've build information (reproducible-builds.org terminology) / SBOM (software bill of materials) data independent of Docker:

And any user of qubes-mirage-firewall can lookup their binary sha256 (sha256 /path/to/vmlinuz ; with this PR merged it is 3f71a1b672a15d145c7d40405dd75f06a2b148d2cfa106dc136e3da38552de41) at https://builds.robur.coop/hash?sha256=3f71a1b672a15d145c7d40405dd75f06a2b148d2cfa106dc136e3da38552de41 (there's an input field on https://builds.robur.coop) and find out

All this is in my opinion a much neater story for the binaries we provide. NB that builds.robur.coop uses (a) orb (from https://github.com/roburio/orb) for building the package and (b) on a daily basis the HEAD of opam-repository (while Dockerfile has a specific commit hardcoded). This means that the latest build on builds.robur.coop may be different from the local build or GitHub action build.

//cc @palainp

palainp commented 1 year ago

This is awesome! Thanks!

The bootstrap for install ocaml 4.14 is a bit annoying but I think the benefit for having reproducibility worth it :) May be it is possible to increase the cache usage and avoid sha256 mismatch by removing the apt update? (to me if a package is updated in the install list is updated repositories it may change the sum, esp. gcc for the solo5 part).

palainp commented 1 year ago

And btw I have the same shasum, and it runs well, so LGTM!

hannesm commented 1 year ago

@palainp suggested

May be it is possible to avoid sha256 mismatch by removing the apt update?

That sounds like a good suggestion, unfortunately I get the following errors:

Step 2/12 : RUN apt install --no-install-recommends --no-install-suggests -y wget ca-certificates git patch unzip make gcc g++ libc-dev
 ---> Running in 5988b7c42f30

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...
Building dependency tree...
Reading state information...
Package ca-certificates is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Unable to locate package wget
E: Package 'ca-certificates' has no installation candidate
E: Unable to locate package git
E: Unable to locate package patch
E: Unable to locate package unzip
E: Unable to locate package make
E: Unable to locate package gcc
E: Unable to locate package g+
E: Unable to locate package libc-dev
The command '/bin/sh -c apt install --no-install-recommends --no-install-suggests -y wget ca-certificates git patch unzip make gcc g++ libc-dev' returned a non-zero code: 100

Since I have barely any knowledge about "docker" and what guarantees Linux distribution give (esp. ubuntu with LTS) in terms of packaging and updates, I don't know how to proceed. I can see how the Dockerfile as suggested here will fail (with newer gcc releases, etc.).

Taking the other path, maybe we need to pick / pin the set of system packages that should be installed explicitly (i.e. name and version) -- as done by orb / builds.robur.coop in the system-packages file (which can be fed to cat filename | xargs apt-get install -y)?

As alternative we can push intermediate docker containers (somewhere?) and have them be used by the script, but then certainly two more issues are present: how to reproduce that container and who is in charge of updating it?

palainp commented 1 year ago

Oh that's bad. I should have tried out myself before proposing that, sorry :/ To help I think we can update the Dockerfile like that:

diff --git a/Dockerfile b/Dockerfile
index c511cdb..56ca5f3 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -4,7 +4,8 @@
 # ubuntu-20.04
 FROM ubuntu@sha256:b25ef49a40b7797937d0d23eca3b0a41701af6757afca23d504d50826f0b37ce

-RUN apt update && apt install --no-install-recommends --no-install-suggests -y wget ca-certificates git patch unzip make gcc g++ libc-dev
+RUN apt-get update && apt-get install --no-install-recommends --no-install-suggests -y wget ca-certificates git patch unzip make \
+gcc='4:9.3.0-1ubuntu2' g++='4:9.3.0-1ubuntu2' libc6-dev='2.31-0ubuntu9.9'
 RUN wget -O /usr/bin/opam https://github.com/ocaml/opam/releases/download/2.1.3/opam-2.1.3-i686-linux && chmod 755 /usr/bin/opam

 ENV OPAMROOT=/tmp

So far I have the same sha256sum and future compilation could fails only when that specific versions are out of the ubuntu repositories so issues will be triggered :)

Does that sounds good to you @hannesm?

hannesm commented 1 year ago

Dear @palainp, thanks for the suggestion. The issue I have with hardcoding some system packages versions is that there are more system packages installed, by being a dependency, and by opam's depext mechanism. So, should they as well be constrained?

The safest (resulting in reproducible builds) would be to use nearly the whole system-packages from https://builds.robur.coop/job/qubes-firewall/build/14878d91-62b2-4ad8-bde5-acb23f6c6575/f/system-packages (orb and builder could be removed) as input to the Dockerfile. But that would be cumbersome from a maintenance point of view.

h01ger commented 1 year ago

On Sun, Nov 13, 2022 at 03:56:35PM -0800, Hannes Mehnert wrote:

As alternative we can push intermediate docker containers (somewhere?) and have them be used by the script, but then certainly two more issues are present: how to reproduce that container and who is in charge of updating it?

Debian developers do publish and update reproducible Debian images on docker.io (which btw are also usable with podman).

-- cheers, Holger

⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄

There are only two kinds of nazis: stupid ones and those without an excuse. (Volker Strübing)

hannesm commented 1 year ago

Thanks for your comment, @h01ger. Would you mind to point me to a "reproducible Debian image on docker.io"? Maybe we can base our builds on that?

Is there a story about installing system packages in a reproducible way? So, does such a Docker image contain a apt cache/database to install stuff from (since running apt update may result in installing some unknown version that may affect reproducibility)?

h01ger commented 1 year ago

On Mon, Nov 14, 2022 at 02:55:32AM -0800, Hannes Mehnert wrote:

Thanks for your comment, @h01ger. Would you mind to point me to a "reproducible Debian image on docker.io"? Maybe we can base our builds on that?

https://hub.docker.com/_/debian (worth reading in full) and https://docker.debian.net/

Is there a story about installing system packages in a reproducible way?

debrebuild from src:devscripts should be able to do that, though https://github.com/fepitre/debrebuild might be a better start. (the latter - despite the name which is about to be changed - is not a fork but rather a rewrite in python.)

-- cheers, Holger

⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄

When you’re used to privilege, equality feels like oppression.

hannesm commented 1 year ago

Thanks again, @h01ger. I think there's some misunderstanding, so let me rephrase what I'm looking for:

  1. we need a base image, thanks to reproducible Debian image on https://hub.docker.com/_/debian
  2. we need some packages (such as gcc, libc-dev, ...) installed onto the base image and would like to ensure that the very same package is installed (since e.g. upgrading the C compiler will result in a different binary)
  3. on top of that, we'll install opam and the qubes-mirage-firewall (we ensure ourselves by using a specific opam-repository commit to always get the same opam packages (tarballs))

To solve (2), I can see two ways: either (1) has a apt-cache and we do not need to run apt update or we provide a list with versions (and/or hashes) of debian packages to install. Do you know how other projects try to deal with (2), or have some ideas how to approach this?

h01ger commented 1 year ago

On Mon, Nov 14, 2022 at 03:29:26AM -0800, Hannes Mehnert wrote:

Thanks again, @h01ger. I think there's some misunderstanding, so let me rephrase what I'm looking for:

  1. we need a base image, thanks to reproducible Debian image on https://hub.docker.com/_/debian
  2. we need some packages (such as gcc, libc-dev, ...) installed onto the base image and would like to ensure that the very same package is installed (since e.g. upgrading the C compiler will result in a different binary)
  3. on top of that, we'll install opam and the qubes-mirage-firewall (we ensure ourselves by using a specific opam-repository commit to always get the same opam packages (tarballs))

yes, i understand.

To solve (2), I can see two ways: either (1) has a apt-cache and we do not need to run apt update or we provide a list with versions (and/or hashes) of debian packages to install. Do you know how other projects try to deal with (2), or have some ideas how to approach this?

i've already replied to this by pointing to the the various debrebuild implementations. surely they are not directly suitable for your usecase, but they could serve as inspirations.

and then, you could also use http://snapshot.notset.fr/archive/debian/20220523T033231Z/ instead of https://deb.debian.org for your apt sources.list.

(snapshot.notset.fr will soon (tm) also be available as under snapshot.reproducible-builds.org but we are not there yet. the archives are copies from snapshot.debian.org which in our experiences has skaling issues...)

you might also like https://github.com/fepitre/debian-snapshot#api

-- cheers, Holger

⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄

Because things are the way they are, things will not stay the way they are. (Bertolt Brecht)

hannesm commented 1 year ago

Thanks for the additional hyperlinks @h01ger, especially the debian archive snapshots may be very suitable for our use.

h01ger commented 1 year ago

On Mon, Nov 14, 2022 at 04:07:53AM -0800, Hannes Mehnert wrote:

Thanks for the additional hyperlinks @h01ger, especially the debian archive snapshots may be very suitable for our use.

very happy to help and to see all this recent activity on qubes-mirage-firewall too! :thumbsup:

-- cheers, Holger

⢀⣴⠾⠻⢶⣦⠀ ⣾⠁⢠⠒⠀⣿⡁ holger@(debian|reproducible-builds|layer-acht).org ⢿⡄⠘⠷⠚⠋⠀ OpenPGP: B8BF54137B09D35CF026FE9D 091AB856069AAA1C ⠈⠳⣄

The devel is in the details.

hannesm commented 1 year ago

I tried the "let's use the system-packages from builds.robur.coop in the Dockerfile" with not much success, since builds.robur.coop uses an older ubuntu 20.04 and the packages are not available anymore.

I suggest we merge this PR as is, and then work on what @h01ger proposed:

That will as well require to use the same container for builds on builds.robur.coop (but that is not an issue, just a reminder). I'll open a separate issue for this.