grahamc / r13y.com

NixOS Reproducibility Checker
https://r13y.com
MIT License
95 stars 15 forks source link

Extend scope #4

Open davidak opened 5 years ago

davidak commented 5 years ago

This project is amazing! It shows that we are very far in the goal to make nixos reproducible.

At least for _nixos-unstable's iso_minimal job for x8664-linux.

It would be nice to have this metric for the stable releases. We might get some headlines for nixos and reproducibility.

It would also be nice to extend the scope of this "experiment" and integrate it into nixos. I'm not sure how this works because there is no documentation (https://github.com/grahamc/r13y.com/issues/5), but you probably compare hashes? So, hydra should calculate this hashes and save them in the metadata of all builds. Then we need some distributed computing infrastructure where the community can run builds on their machines and submit their hashes. Then a central instance can calculate how reproducible each package is that has this data available. We can use that data on our website and package search.

I know BOINC for distributed computing tasks, but we might find a less complex solution.

I would like to just execute one command to build packages from a job set or channel i like and submit hashes.

grahamc commented 5 years ago

The current architecture is very simple:

Given an input expression:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/check.rs#L32-L46

We evaluate the derivation:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/check.rs#L87-L109

and then query all of the dependencies of that derivation:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/check.rs#L118-L126

Each dependency gets added to a queue for verification:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/check.rs#L127-L132

Then, for each dependency we build it twice. First with a fairly standard set of build options, where it will almost certainly be fetched from the binary cache:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/check.rs#L192-L199

Then we build it a second time with --check, forcing the local machine to build it again:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/check.rs#L223-L233

Nix will then exit 0 if the output matched bit-for-bit the original build: https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/check.rs#L247

Otherwise the build is not reproducible, and we do some fancier things:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/check.rs#L260-L276

Notably we take the original build output, and the not-matching build output and export them as a NAR, and then copy them to a content-addressed store (CAS):

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/check.rs#L295-L305

The result of this process is a JSON document containing a list of builds: an object saying the path was reproducible, or an object which includes the derivation and the hash of the two NARs.


A separate process, report.rs starts a similar way. Given an expression:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/report.rs#L28-L42

Evaluate it and find a list of derivations and dependencies (todo: refactor :) ):

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/report.rs#L64-L113

and create a report of all the paths:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/report.rs#L136-L144

Given an unreproducible path, it takes the two NARs from the CAS, extracts them to the Nix store and runs Diffoscope on them:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/report.rs#L147-L169

From there, a simple HTML report is generated:

https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/bin/report.rs#L194-L204


The ideas behind using this instruction, the JSON document, and the CAS store is explicitly designed around being able to distribute the work to many builders, to make it easy to grow this project later. I would love help with this. You can see the data types I thought about here, which are designed with this in mind: https://github.com/grahamc/r13y.com/blob/f144539ae80108bd8d7bf243d67011ca63198dce/src/messages.rs

One thing these data types assume is that every builder would randomize the list of derivations and try to build all of them. The idea being have many builders try the same thing makes us more sure about the reproducibility. Now I wonder if we would want something a bit different, to allow greater coverage. My thinking now is the central server would publish the same instruction, but also publish statistics about how many times each derivation has been built. In this way builders can prioritize a low-count derivations first.

Does this help?

grahamc commented 5 years ago

One challenge which might come up by expanding scope is knowing how to visualize the list of brokenness. We shouldn't try to solve that until after it is a problem, though. Just thinking about it as it was a tough problem for

grahamc commented 5 years ago

Some more of what I was thinking. I have no strong opinions on how to implement this, and would love help anyone wants to provide.

CAS

I use a CAS store for the NARs thinking builders would upload the NARs to something like S3, and it would be good for them to not all re-upload the one they fetched from the cache. Even better, avoiding uploading duplicate NARs if the unreproducibility comes from the current date is nice.

You can see some existing (currently useless) code around this. For example, the "report_url".

Diffoscope

Diffoscope can take many gigabytes of RAM, especially when comparing ISOs and mksquashfs outputs. The final architecture ideally could run the diffoscope process on one system, and upload it to the cache and then the website can link to that cache.

People running builds should not be expected to actually run the diffoscope step.

davidak commented 5 years ago

Great ideas.

I don't understand all details of the rust code, but got a good picture of the project. A readme should not go in such detail. Such information are probably better as comments in the code itself.

Most visitors or users are probably not interested in implementation details.

Nix will then exit 0 if the output matched bit-for-bit the original build:

so the comparison is done by Nix. for a visitor not familiar with Nix or Nixos, it would be good to note here how it's done and maybe link to Nix manual

i think sha256 hash of the build result path?

we build it twice

it would be good to just get a hash from the hydra build and compare against that. when the hash is not identical, we can still fetch the path and compare with diffoscope

One challenge which might come up by expanding scope is knowing how to visualize the list of brokenness.

that alone is a great task for someone who is a specialist in the field of data visualisation

but that's also a task any distro active in the reproducibility challenge faces, so we can cooperate there

grahamc commented 5 years ago

so the comparison is done by Nix. for a visitor not familiar with Nix or Nixos, it would be good to note here how it's done and maybe link to Nix manual

Nix does this by creating a NAR for the build, and comparing the hashes of the NARs. Essentially the same as hashing the result path.

We just "nix-build" it twice, we don't actually perform the first build, as Hydra has (presumably) done the first. The first build is substituting the build from the cache, as --check requires the build have been done before.

davidak commented 5 years ago

Nix does this by creating a NAR for the build, and comparing the hashes of the NARs.

We just "nix-build" it twice, we don't actually perform the first build, as Hydra has (presumably) done the first.

So i think Hydra should create hashes for every package. Then we just have to get the hash and don't need to download the whole package when it's already reproducible.

Do you think that's a good idea?

We might need to change the Hydra build jobs to create the hash and save them somewhere and change Nix to use that hash for reproducibility check. If that check fails, get the package...

grahamc commented 5 years ago

We already can get the hash of the NAR:

$ curl https://cache.nixos.org/$(readlink $(which bash) | cut -d/ -f4 | cut -d'-' -f1).narinfo
StorePath: /nix/store/93h01q6yg13xdrabvqbddzbk11w6a928-bash-interactive-4.4-p23
URL: nar/037ypxfkl3ggfjlvfwxhxsynk31y7wibyd35d94qqzja7mpkk1w6.nar.xz
Compression: xz
FileHash: sha256:037ypxfkl3ggfjlvfwxhxsynk31y7wibyd35d94qqzja7mpkk1w6
FileSize: 927440
NarHash: sha256:0cpr1xwqslpmjdgpg8n9fvy2icsdzr4bp0hg2f9r47fyzsm36qqp
NarSize: 5650960
References: 681354n3k44r8z90m35hm8945vsp95h1-glibc-2.27 93h01q6yg13xdrabvqbddzbk11w6a928-bash-interactive-4.4-p23 adc71v5apk4dzcxg7cjqgszjg1a6pd0z-ncurses-6.1-20190112 cinw572b38aln37glr0zb8lxwrgaffl4-bash-4.4-p23 q626bqzjsnzsqpxwd79l1501did3qy4k-readline-7.0p5
Deriver: 74r7m998kk1b5b9618yr1wy1rvrdvbga-bash-interactive-4.4-p23.drv
Sig: cache.nixos.org-1:CyY1jYISWaLV6BJML++MXP6FNUOkMSBCIFr7qZBMPWf28C74cbJGPnb1dFdye9cdb6S40I0SzHGJb3z8WpH1CA==

but I think it is not too much to ask to download the pre-built nar anyway, as it would make this much more complex I think.

davidak commented 5 years ago

but I think it is not too much to ask to download the pre-built nar anyway, as it would make this much more complex I think.

it would be more sustainable as we don't waste resources. also get faster results

So that's a topic for a feature-request for Nix. I'll create one...

davidak commented 5 years ago

@grahamc do you plan to extend the project in the near future, so others can contribute with builds or does that has a low priority?

grahamc commented 5 years ago

I'm not sure. I haven't done substantial work on this project in a while now. If someone else were to contribute some code, that would surely help :)