mozilla / sccache

Sccache is a ccache-like tool. It is used as a compiler wrapper and avoids compilation when possible. Sccache has the capability to utilize caching in remote storage environments, including various cloud storage options, or alternatively, in local storage.
Apache License 2.0
5.74k stars 542 forks source link

`overlay` are not unmounted #1688

Open lissyx opened 1 year ago

lissyx commented 1 year ago

After upgrading to v0.4.0, I started to have sccache-dist being sluggish (~5 min to complete 10% of mozilla-central's configure when a full build without cache is ~4min).

After quick investigation, it is on server side. Trying to CTRL+C the sccache-dist server process results in:

A look at mount shows 1052 instances of mounted overlay similar to:

overlay on XXX/build/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-769/target type overlay (rw,relatime,lowerdir=XXX/sccache/build/toolchains/746755680e5
80184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101,upperdir=XXX/sccache/build/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-769/upper,workdir=XXX/sccache/buil
d/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-769/work)
sylvestre commented 1 year ago

@Xuanwo @drahnr does it bing a bell ? :)

lissyx commented 1 year ago

Doing a second build in a row, I get 3631 overlay mounted at the end of the second build that is taking super long to finish.

lissyx commented 1 year ago

Looks like I inversed passing env var with sudo. Properlying setting them, I can now see stuff I like (errors):

[2023-03-27T08:54:34Z ERROR sccache_dist::build] Failed to remove build directory XXX/build/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-38: failed to remove directory `XXX/sccache/build/builds/746755680e580184e658dbc0c5af280e31b72bf026f2d64adb215757f9099101-38`
[2023-03-27T08:54:34Z ERROR sccache_dist::build] Failed to remove build directory XXX/sccache/build/builds/f06aab84da875aa1c85313a2a200f47cf9104f1ba35eb8df51245f98effa8a51-11: failed to remove directory XXX/sccache/build/builds/f06aab84da875aa1c85313a2a200f47cf9104f1ba35eb8df51245f98effa8a51-11`
drahnr commented 1 year ago

No, not aware. Suspicious that #1628 might have caused the issue, it's the only thing that modified code in the vincinity.

lissyx commented 1 year ago

https://github.com/mozilla/sccache/blob/f9cc320e4f3e9cf0a9186c981802bc68fd23ece2/src/bin/sccache-dist/build.rs#L400-L416

lissyx commented 1 year ago

No, not aware. Suspicious that #1628 might have caused the issue, it's the only thing that modified code in the vincinity.

When I get more time I can try and bisect to verify, but so far, going back to v0.4.0.pre.10 (before https://github.com/mozilla/sccache/commit/20a08fc079c6afe09ef6d82edccda6ca5273c8a1) seems to be enough

lissyx commented 1 year ago

No, not aware. Suspicious that #1628 might have caused the issue, it's the only thing that modified code in the vincinity.

When I get more time I can try and bisect to verify, but so far, going back to v0.4.0.pre.10 (before 20a08fc) seems to be enough

My guess is something in https://github.com/mozilla/sccache/commit/20a08fc079c6afe09ef6d82edccda6ca5273c8a1 broke how overlay are unmounted, and this is visible via the error above, because the same way my manual rm -fr fails due to EBUSY on the mount point, fs::remote_all_dir() is likely failing due to the same thing.

lissyx commented 1 year ago

I am still running as root via bubblewrap, in case that matters. I'll try and poke around in the code when I get some time, but I remain available to testing patches and/or prodiving more debug inputs.