rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.77k stars 12.5k forks source link

Tracking peak total storage use #129808

Open workingjubilee opened 2 weeks ago

workingjubilee commented 2 weeks ago

Related August 29th CI event

On [August 29th, around 3:59 Pacific Daylight Time](https://github.com/rust-lang/rust/pull/129735#issuecomment-2317323833), our CI started to fail due to not having enough storage available. It merged a few PRs, but then about 10 hours later it merged the final PR that it would merge that day: https://github.com/rust-lang/rust/commit/0d634185dfddefe09047881175f35c65d68dcff1 It continued to fail for hours. The spotty CI passes were probably due to GitHub initiating a rollout to their fleet that took 12 hours to reach complete global saturation. At that moment, GitHub reduced the actual levels of storage offered to runners to levels that closely reflect their service agreement. See https://github.com/actions/runner-images/issues/10511 for more on that. Eventually, I landed https://github.com/rust-lang/rust/pull/129797 which seemed to get CI going again.

Do we take up too much space?

We have had our storage usage grow, arguably to concerning levels, over time. Yes, a lot compresses for transfer, but I'm talking about peak storage occupancy here. And tarballs are not a format that are conducive to accessing individual files, so in practice, the relevant data occupies hosts in its full, uncompressed glory nonetheless. We also generate quite a lot of build intermediates. Big ones. Some of this is unavoidable, but we should consider investigating ways to reduce storage occupancy of the toolchain and its build intermediates.

Besides, we are having issues keeping our storage usage under the amount available to CI, even if there are other aggravating events. Obviously, clearing CI storage space can be done as a dirty hack to get things running again, but changes that benefit the entire ecosystem are more desirable. However, note that a solution that reduces storage but significantly increases the number of filesystem accesses, especially during compiler or tool builds, is likely to make CI problems worse due to this fun little issue:

I'm opening this issue as a question, effectively: We track how much time the compiler costs, but what about space? Where are we tracking things like e.g. total doc size (possibly divided between libstd doc size and so on)? Are we aware of things like how much space is used by incremental compilation or other intermediates, and how it changes between versions? How about things like e.g. how many crater subjobs run out of space in each beta run? Where would someone find this information?

VorpalBlade commented 2 weeks ago

While this issue is about improving this for rustc itself, is there any issue about reducing the size of "normal" rust programs built with cargo? That is also a problem. For example:

jieyouxu commented 2 weeks ago

cc Metrics Initiative #128914

bjorn3 commented 2 weeks ago

I cannot use the rust-cache github action effectively because the total cache from all the different configurations (lots of architectures x stable, nightly, MSRV) ends up above the limit of 10 GB.

Have you disabled incr comp already? CI generally benefits less from incr comp, but it does take a lot of space. (To be precise restoring the incr comp cache from the network can take longer than the amount of time saved by incr comp) CARGO_INCREMENTAL=false can be used to disable it.

VorpalBlade commented 2 weeks ago

Have you disabled incr comp already? CI generally benefits less from incr comp, but it does take a lot of space. (To be precise restoring the incr comp cache from the network can take longer than the amount of time saved by incr comp) CARGO_INCREMENTAL=false can be used to disable it.

I did that yes. And disable debug info too. But I target 32 and 64 bit. Across x86, ARM, RISCV and PPC. And all three major operating systems (obviously not all combinations are possible). That quickly adds up due to combinatorial explosion. The total cache size I would need currently is around 18 GB for that project.

As such it would be nice to be able to reduce the size of things that rustc produces. From my experience C++ doesn't expand (based on LOC) quite so voluminously. Of course, they are different languages (and Rust has more advanced abstractions) but perhaps there are things that could be done to the intermediate file formats.

workingjubilee commented 2 weeks ago

This is not actually about reducing the size per se. It is about being aware of whether or not we have reduced the size, as a first step. If we just reduce the size and then do not know the history before or after, it is a bit like merging code without regression tests. You should avoid it if you are trying to actually make an improvement, especially if you want it to stick.

the8472 commented 2 weeks ago

There's cpu-usage-over-time.py, perhaps that could be extended to also gather disk usage. Though it currently emits CSV, json-lines would be more appropriate for more structured data.

workingjubilee commented 1 week ago

( I waffled back and forth on whether libs cares, and then I remembered all the libs PRs about compiled binary size, so... )

Anyways, nominating this for uh Basically Everyone I guess. I asked some more specific questions but also feel free to answer the question in a broad way, basically on two parts:

  1. What information is this team currently tracking re: size?
  2. What information would this team like to have available re: size?

@rustbot label: +I-compiler-nominated +I-libs-nominated +I-release-nominated

apiraino commented 1 week ago

Would T-infra (the only team not nominated :sweat_smile: ) have more historical context? Maybe I didn't search deep enough but I couldn't find on Zulip a thread with them discussing this issue

Amanieu commented 1 week ago

T-libs is not really concerned with the size of intermediate build products, that's mostly a T-compiler/T-cargo issue. We do track code size for the final binary, and keep an eye out for issues related to the size of rlibs shipped in rustup or the size of docs but that's it.

workingjubilee commented 1 week ago

Would T-infra (the only team not nominated 😅 ) have more historical context? Maybe I didn't search deep enough but I couldn't find on Zulip a thread with them discussing this issue

probably but there's no I-infra-nominated so in the absence of confirmation that they are attending to it, it seemed easiest to volley the question at a relevant subteam. :^) (GitHub notifications seemed more disruptive since they would continue to pester people in a stochastic way even after someone took care of responding.)

apiraino commented 2 days ago

Discussed by T-compiler last week on Zulip.

T-compiler is not aware of anything obvious that contributes to disk usage exploding but happy to try and make savings. We would need more information on what's taking up space during the CI run, though.

@rustbot label -I-compiler-nominated