monogon-dev / monogon

The Monogon Monorepo. May contain traces of peanuts and a ✨pure Go Linux userland✨. Work in progress!
https://monogon.tech
Apache License 2.0
378 stars 9 forks source link

//metropolis/node:image_gcp flake #251

Closed leoluk closed 1 year ago

leoluk commented 1 year ago

Whatever this is: https://jenkins.monogon.dev/job/gerrit-presubmit-monogon/job/53%252F1953%252F1/1/console

ERROR: /home/ci/monogon/metropolis/node/BUILD.bazel:169:8: Executing genrule //metropolis/node:image_gcp failed: (Exit 1): bash failed: error executing command /bin/bash -c ... (remaining 1 argument skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
gzip: warning: GZIP environment variable is deprecated; use an alias or script
tar: disk.raw: file changed as we read it

https://github.com/Distrotech/tar/blob/273975bec1ff7d591d7ab8a63c08a02a285ffad3/src/create.c#L1788-L1793

This isn't even a test. I don't really see how this could happen unless some concurrent rule modifies the output root, if such a thing is even possible (shouldn't inputs be mapped read-only?)

May or may not be related:

leoluk commented 1 year ago

Unable to reproduce it after a few hundred clean rebuilds:

while bazel build --action_env=CAT=$(date +%s) //metropolis/node:image_gcp; do sleep 1; done
leoluk commented 1 year ago

@fionera reports that this can be reproduced by bazel test //..., so it's most likely interference by another test modifying the image somehow.

lorenz commented 1 year ago

https://jenkins.monogon.dev/job/gerrit-presubmit-monogon/job/74%252F1874%252F22/1/console

Just got the CI to reproduce it.

lorenz commented 1 year ago

This has now gotten to near-100% CI reproducibility, I can barely get a successful build anymore. I looked into it again and I still cannot see what causes this.

leoluk commented 1 year ago

image_gcp was removed, and when reintroducing, we should write some small Go-based rule for it or figure out how to make rules_pkg work: https://review.monogon.dev/c/monogon/+/2028

Cause unclear - it's either a correctness issue where the file actually changes due to concurrent builds or test somehow modifying their inputs, or a sandboxing artifact where the mtimes isn't stable (without violating correctness, but confusing tar).