Closed kolloch closed 9 months ago
If I compare inside/outside of the container, I get e.g. this difference:
result_store_inside/nix/store/rc4p7zzhr481wl1vfh3zmr5kgb26shyc-layers.json/layers.json
--- result_store_outside/nix/store/rc4p7zzhr481wl1vfh3zmr5kgb26shyc-layers.json/layers.json 1970-01-01 01:00:01.000000000 +0100
+++ result_store_inside/nix/store/rc4p7zzhr481wl1vfh3zmr5kgb26shyc-layers.json/layers.json 1970-01-01 01:00:01.000000000 +0100
@@ -1,8 +1,8 @@
[
{
- "digest": "sha256:17ce2732d9ad185f1ad8e3705211fd7a94dea9888d63a57c4e1e1f1f48c67c0a",
+ "digest": "sha256:20ea6e6cb56aa5dc118ae4b4fbeddbb221ddca60a5e13677fee3c44f079ee3fe",
"size": 490496,
- "diff_ids": "sha256:17ce2732d9ad185f1ad8e3705211fd7a94dea9888d63a57c4e1e1f1f48c67c0a",
+ "diff_ids": "sha256:20ea6e6cb56aa5dc118ae4b4fbeddbb221ddca60a5e13677fee3c44f079ee3fe",
"paths": [
{
"path": "/nix/store/x4xwf94jik4dmw15q3m2vihi7kgn7qxf-policy.json"
And the whole image json. Interestingly, the store paths are NOT different but the hashes are.
--- result_store_outside/nix/store/47g6iqca8nhldrrin16imczai5fcj929-image-nix-ci.json 1970-01-01 01:00:01.000000000 +0100
+++ result_store_inside/nix/store/47g6iqca8nhldrrin16imczai5fcj929-image-nix-ci.json 1970-01-01 01:00:01.000000000 +0100
@@ -229,9 +229,9 @@
"mediatype": "application/vnd.oci.image.layer.v1.tar"
},
{
- "digest": "sha256:17ce2732d9ad185f1ad8e3705211fd7a94dea9888d63a57c4e1e1f1f48c67c0a",
+ "digest": "sha256:20ea6e6cb56aa5dc118ae4b4fbeddbb221ddca60a5e13677fee3c44f079ee3fe",
"size": 490496,
- "diff_ids": "sha256:17ce2732d9ad185f1ad8e3705211fd7a94dea9888d63a57c4e1e1f1f48c67c0a",
+ "diff_ids": "sha256:20ea6e6cb56aa5dc118ae4b4fbeddbb221ddca60a5e13677fee3c44f079ee3fe",
"paths": [
{
"path": "/nix/store/x4xwf94jik4dmw15q3m2vihi7kgn7qxf-policy.json"
@@ -295,9 +295,9 @@
"mediatype": "application/vnd.oci.image.layer.v1.tar"
},
{
- "digest": "sha256:9db223ef3ca7b32de6b8bd43488490aad02c5c3a0b41a1a6be015ff1ff1cb26a",
+ "digest": "sha256:440aaf385e0f8881fc63c339b236cf9e67320b93516abaacd6c73802052bfaf8",
"size": 8797696,
- "diff_ids": "sha256:9db223ef3ca7b32de6b8bd43488490aad02c5c3a0b41a1a6be015ff1ff1cb26a",
+ "diff_ids": "sha256:440aaf385e0f8881fc63c339b236cf9e67320b93516abaacd6c73802052bfaf8",
"paths": [
{
"path": "/nix/store/yyp0vrv39vgir6w2riq5s4cnjnxk581p-mkDirs"
I compared that path
values in the ValidPath
table inside the container with the contents of the /nix/store
directory -- and the paths were the same. (with patch) (did not check the contents)
I will take a look later, but to debug such kind of issues, i will do something such as explained in this comment: https://github.com/nlewo/nix2container/issues/23#issuecomment-1147710161
This would allow you to compare files in the layer produced at build time to the files in the layer produced at push time.
Hi @nlewo,
Thanks for the tip! I enabled it in the test repo and also added "scripts" for reproducing the same expression in the two different docker containers.
See https://gitlab.com/nexxiot-labs/nix2container-checksum/-/blob/5c0925e48dd6ff12b5584187c26c64b88719dd76/README.md for instructions and results.
I guess that the nix-database
derivation is not reproducible is somewhat unsurprising. but the two others -- weird.
(I also had a read around your code. Nice job! Unfortunately, didn't really see an issue.)
Hi @nlewo,
Tried to debug this some more and wrote my findings in here https://gitlab.com/nexxiot-labs/nix2container-checksum/-/blob/465dd01ba20887fea1457c4205713b7ae291b99a/README.md
Everything is hopefully reproducible by executing commands such as
nix run .\#x86_64-linux.gitlab.runnables.debug-building-nix-ci-in-nix-ci
Which also contain debug info.
Maybe the most surprising result first: All layers except the top-level layer are actually the same between working and non-working builds. Only the top-level image layer also containing the nix-database differs.
If you add a package to that top-level layer (I did that with hello
), then the build for that container succeeds.
I assume that some incorrect substitution happens otherwise, not sure.
The non-working solution also have content-addressable hashes in the closuregraph for some reason.
I'd be happy to debug that together. I am in the CET time zone and are rather flexible.
Hi @nlewo,
I updated my PR quite significantly:
https://github.com/nlewo/nix2container/pull/96#issuecomment-1806394908
With this PR in place, most things are deterministic.
Very weirdly, if you set reproducible=false
, then the container builds. If not, the old error.
I stared long and hard at functions such as newLayers
and the core piece of the digest/tar implementation: TarPaths
But they honestly look quite nicely written and good to me :shrug:
I also updated the README/code of https://gitlab.com/nexxiot-labs/nix2container-checksum so that it should be quite straight-forward to repo. At least for me, it is fully reproducible.
Crazy idea: One of the differences I observed in the closure-graph.json was that in the failing case I saw content-adressed hashes.
What if, the content-adressing somehow partially rewrites existing derivations and thus changes the digest when later read?
Serializing the file into a tar (reproducible = false
) works around this issue, since the tar at least doesn't change.
Within the nix-ci
container with nix running as user
, I can reproduce the issue by:
nix run .#aarch64-linux.gitlab.containers.only-stdenv.copyTo oci:test-oci
from nix2container-checksum which is build by:
only-stdenv = nix2container.buildImage {
name = "only-stdenv";
copyToRoot = [nixpkgs.stdenv];
};
With my fixes from https://github.com/nlewo/nix2container/pull/96 and without stdenv, it actually works now! :)
I cannot reproduce the issue anymore 🤷
Problem
When building docker images with nix2container within an image build with nix2container, I am getting weird errors like this:
Reproduction
This repo contains everything to reproduce the error for me.
With docker and local copy:
nix run -L .\#containers.x86_64-linux.nix-ci.copyToDockerDaemon
docker run --privileged -v $PWD:/workspace -v ~/.docker:/home/user/.docker --workdir /workspace -it $(nix eval --raw ".#containers.x86_64-linux.nix-ci.imageRefUnsafe") nix run .\#containers.x86_64-linux.nix-ci.copyTo oci:oci_sample_inside
Ideas
I wonder if it is some sort of store path corruption?
I build it with and without https://github.com/nlewo/nix2container/pull/96: