Custom LLVM_PROFILE_FILE Modes

taiki-e / cargo-llvm-cov

Cargo subcommand to easily use LLVM source-based code coverage (-C instrument-coverage).

Apache License 2.0

920 stars 57 forks source link

Custom LLVM_PROFILE_FILE Modes #335

Closed cwfitzgerald closed 8 months ago

cwfitzgerald commented 8 months ago

In wgpu's tests, with 1000 or so tests with a large binary, we are running out of space holding all the intermediate profraw files. Each one is about 20MB, and there are a thousand of them, and it easily eats through the 14GB provided by github actions without paying more.

Looking at https://doc.rust-lang.org/rustc/instrument-coverage.html#running-the-instrumented-binary-to-generate-raw-coverage-profiling-data it seems like there are modes which consolidate the data into a few files. Notably %Nm looks very useful for this use case.

It would be nice to have a flag, or some other mechanism to get access to this. There is a limit on parallelism (9, per the docs) so it isn't good for all situations, but if you're on a constrained system like CI, this restriction is fine.

taiki-e commented 8 months ago

Looking at https://doc.rust-lang.org/rustc/instrument-coverage.html#running-the-instrumented-binary-to-generate-raw-coverage-profiling-data it seems like there are modes which consolidate the data into a few files. Notably %Nm looks very useful for this use case.

I'm not sure that will fix the problem. If you are using nextest you are already using it.

https://github.com/taiki-e/cargo-llvm-cov/blob/e3dd19fe090eb27ec37f8c41ed129c0aa838054e/src/main.rs#L214-L230

From experience, my impression is that when there is a disk usage problem, it is easier to just free up disk space than to look for workarounds to reduce disk usage (https://github.com/taiki-e/portable-atomic/commit/de87ecc7fa4918fb82a69101e8ab9f7ad807718a, https://github.com/taiki-e/workflows/commit/382d34484e6ae7ce7d9aa2892f16682561befa6f).

cwfitzgerald commented 8 months ago

From experience, my impression is that when there is a disk usage problem, it is easier to just free up disk space than to look for workarounds to reduce disk usage

Unfortunately at the rate our test disk space are growing, that is only buying us a little time until we need to start using the larger github runners with more space.

I'm not sure that will fix the problem. If you are using nextest you are already using it.

Interesting - the idea I was thinking of was using {}-%{}m instead of {}-%p-%{}m so that as much re-use as possible happens. I wonder if the PID behavior on windows causes more usage than on linux/mac where we didn't hit this problem.

taiki-e commented 8 months ago

the idea I was thinking of was using {}-%{}m instead of {}-%p-%{}m so that as much re-use as possible happens.

Ah, interesting.

I've implemented this in llvm-profile-file-name branch. Could you try to see if it works as expected?

cargo install cargo-llvm-cov --git https://github.com/taiki-e/cargo-llvm-cov --branch llvm-profile-file-name

LLVM_PROFILE_FILE_NAME='<name>.profraw' cargo llvm-cov ...

cwfitzgerald commented 8 months ago

Nice!

Using the default profile name, running wgpu's 1000 test, test suite on my windows machine results in: 14.3 GB of profraw files.

Using "wgpu-%8n.profraw" as the profile name, I get 26 (264MB) profraw files.

Using "wgpu-%n.profraw" as the profile name, I get 13 (125MB) profraw files. This == the amount of binaries we have in our test suite, meaning it won't scale up as we add more tests.

I built the reports from the various runs and they were completely identical. Additionally, changing the number of allowed profiles didn't seem to affect runtime at all, though we generally have longer running tests (4s each). Recorded on a 20 core machine.

This is a massive win for us, and completely eliminates the problem. I might actually suggest that the nextest default be {}-%Nm, as this feels like a much better default.

cwfitzgerald commented 8 months ago

Just tested with a subset of our tests which are very fast (<0.1s each) and the profile file name setting makes no difference in suite runtime.

taiki-e commented 8 months ago

Published in v0.6.2. Thanks for testing this!

I might actually suggest that the nextest default be {}-%Nm, as this feels like a much better default.

Yeah, it would be preferable to do so if it would not affect the accuracy.