rust-lang / crater

Run experiments across parts of the Rust ecosystem!
https://crater.rust-lang.org
643 stars 90 forks source link

Disk full errors should be spurious #700

Open Noratrieb opened 1 year ago

Noratrieb commented 1 year ago

https://crater-reports.s3.amazonaws.com/pr-115235/try%235a3d3b91048a0adf280e7a4e589c1dda6443f172/gh/SapientAsh.solstice_calculator/log.txt

This is a spurious error but isn't marked as such.

[INFO] [stdout]   = note: /usr/bin/ld: final link failed: No space left on device
[INFO] [stdout]           collect2: error: ld returned 1 exit status
RalfJung commented 1 year ago

https://crater-reports.s3.amazonaws.com/pr-116284/index.html also has quite a few of these.

tbu- commented 6 months ago

Also happened here: https://crater.rust-lang.org/ex/pr-124636.

Skgland commented 6 months ago

Also happened here: https://crater.rust-lang.org/ex/pr-124636.

I see 213 entries correctly sorted under spurious-regressed with build no space left on device. And 9 or 10 under regressed with build failed (unknown)

For the 10 the errors are - `[INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/d0392f4632b22de6528ebe4efcc10f002656b8723c4a1b710ed4458e6b02ea3e-init: no space left on device` - `[INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/bd2885f2ca702c73f2089fd560ca5812020f20a703dd920f909fcf68b043099e/diff: no space left on device` - `[INFO] [stderr] Error response from daemon: symlink ../1744f2c6d6c1b7751a30adc89afb973fac6960b6b240c59be797348b8d713855/diff /var/lib/docker/overlay2/l/YQMKEEA6H4GSSXCHI2Y5LO3XBH: no space left on device` - `[INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/0fa118698fba3bdf766053a429fda884e20458aa114597c09bd675a92e77adcb-init: no space left on device` - `[INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/35445b29951a97ad96f344bd33abb3c7976dcd0a7db9620eb7ebfaed05284b94-init: no space left on device` - `[INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/5cd8254e4074e5a5dc108cf93c21d54d17667b05d9d70dd70a7404d4a2600f55-init: no space left on device` - `[INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/d1f525505d6a64c02a2fc42e1e7e0fa163877b4776a8c7ca3c94d69358ca848e-init: no space left on device` - `[INFO] [stderr] Error response from daemon: write /var/lib/docker/containers/08cdd65fbdc00b34e781e9ffe505de039f45012ef9e480a41d13c7b4780739d5/.tmp-config.v2.json2346282875: no space left on device` - log truncated, no error show, so uncertain if this is even a `no space left on device` error - `[INFO] [stderr] Error response from daemon: mkdir /var/lib/docker/overlay2/9404567f07eaeadab3dc07a00c6d4ea16806738d6eaa4b83b6ed78c401868307-init: no space left on device`

So it looks like for all these the docker daemon failed to perform some action due to the full device. Another source not seen here is described in #715. i.e. when a dependency fails to compile it is always a DependsOn failure even if a no space left on device error occurred. This can be seen in crater result posted above by RalphJung under regressed: dependencies

https://crater-reports.s3.amazonaws.com/pr-116284/index.html also has quite a few of these.

tbu- commented 6 months ago

What leads to the disk being full for crater?

Skgland commented 6 months ago

To reduce computation time crater does not reset to a pristine environment between crates so that the builds of dependencies can be reused. So build artifacts accumulate and can fill up the disk.

There is supposed to be a disk space every now and then see this comment https://github.com/rust-lang/crater/blob/master/src/runner/tasks.rs#L172-L185 that is supposed to cleanup when that happens. The disc-space-watcher is started over here https://github.com/rust-lang/crater/blob/master/src/runner/mod.rs#L98 and appear to be currently configured to run every 30 seconds with a threshold of 80%.

tbu- commented 6 months ago

I see, thank you.

So in a sense, disk-full errors are spurious, but shouldn't be ignored, but and the crates maybe retried after cleaning up some disk space.