rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.96k stars 12.69k forks source link

Nightly - LLVM code coverage - Issue with generating code coverage when using Prometheus crate #79645

Closed BenChand closed 3 years ago

BenChand commented 3 years ago

I've been encountering a problem with generating code coverage for a project that uses the crate prometheus, specifically versions 0.9 onwards. Using these versions of the prometheus crate results in the error Failed to load coverage: Truncated coverage data when attempting to build the final coverage report. I've attempted to generalize the problem into a small test project here:

https://github.com/BenChand/prometheus-code-cov

I'm currently using nightly-2020-11-25. I've also listed 3 ways that I am able to get code coverage working in that example project when I change parts related to the prometheus crate.

I've tried with these versions of LLVM:

11.0.0-rust-1.50.0-nightly 11.0.1

Expected:

$ rm *.prof*
$ cargo clean
$ RUSTFLAGS="-Z instrument-coverage" LLVM_PROFILE_FILE="cargo-%m.profraw" cargo +nightly-2020-11-25 test --tests
$ cargo +nightly-2020-11-25 profdata -- merge -sparse *.profraw -o out.profdata
$ cargo cov -- report --instr-profile=./out.profdata target/debug/deps/prometheus_code_cov-24bcfe5d001389f5
...
src/lib.rs                                                                                                                12                 8    33.33%           6                 2    66.67%          24                 9    62.50%
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL    

Instead, this happened:

$ rm *.prof*
$ cargo clean
$ RUSTFLAGS="-Z instrument-coverage" LLVM_PROFILE_FILE="cargo-%m.profraw" cargo +nightly-2020-11-25 test --tests
...
     Running target/debug/deps/prometheus_code_cov-24bcfe5d001389f5

running 1 test
test tests::it_returns_1 ... ok
...
$ cargo +nightly-2020-11-25 profdata -- merge -sparse *.profraw -o out.profdata
$ cargo cov -- report --instr-profile=./out.profdata target/debug/deps/prometheus_code_cov-24bcfe5d001389f5
error: target/debug/deps/prometheus_code_cov-24bcfe5d001389f5: Failed to load coverage: Truncated coverage data

Meta

Rust version: nightly-2020-11-25

@richkadel

richkadel commented 3 years ago

We should try this again with the changes in PR #79109, which should be landing into nightly soon (a day or two?)

richkadel commented 3 years ago

@BenChand - Would you mind retrying this with the latest nightly? PR #79109 landed about a week ago. There were several fixes and improvements, including fixes for similar problems. It may work now. If not, at least I know where to start looking. LMK.

Thanks!

BenChand commented 3 years ago

@richkadel - I've retried with nightly-2020-12-09 and it seems to work now!

richkadel commented 3 years ago

@BenChand - Great! Feel free to close this issue if you are satisfied. Thanks!

gkorland commented 2 years ago

A day ago we started to get this error again "Failed to load coverage: Truncated coverage data" on nightly. @richkadel it seems to be working fine on nightly-2022-01-09

richkadel commented 2 years ago

@wesleywiser I assume this is related to your recent change. We should review the thread and see if there's a clue as to why the regression.

wesleywiser commented 2 years ago

@gkorland are you seeing this with the Prometheus crate or some other crate?

davidhewitt commented 2 years ago

Can't speak for @gkorland, however I'm also seeing this on the PyO3 CI: https://github.com/PyO3/pyo3/runs/4833765075?check_suite_focus=true

richkadel commented 2 years ago

@wesleywiser - I downloaded the original test case mentioned at the top of this issue, and confirmed I get the error message. Unfortunately, as we've seen with similar error messages from the LLVM CoverageMappingReader, the message is almost useless. It's generated when the code calls return make_error<CoverageMapError>(coveragemap_error::truncated);, but it generates the same error message from six different code locations, and there's no way that I'm aware of to know which line this error is coming from. (And it would be helpful to get more context, i.e., to print the module and function for the bad data being parsed at the time the coverage reader fails.)

It looks like this specific message is generated when CoverageMappingReader expects some data that is not there.

It's not clear what change in #79109 fixed this problem. That was a large PR that addressed many things in rust coverage, when I was in the thick of developing it. But I see one commit (within a rebased, combined commit) in that PR that says:

Restrict adding unreachable regions to covered files

Improved the code that adds coverage for uncalled functions (with MIR but not-codegenned) to avoid generating coverage in files not already included in the files with covered functions.

Based on your change in #92142, my guess (but still very much a guess) is this change may have fixed the original issue, and your change very likely could have reverted this solution. There is a lot of cross-referencing in the LLVM-IR between variables that point to the coverage mapping data and variables that point to the actual functions, within the same module. Since you tried to put all unused functions in the same module, perhaps the coverage mapping data is in a different module than the function definition? (I don't think that's the case, but worth double checking.) Or maybe the module you picked is getting garbage collected/dropped at link time in this case. (I think we talked about this possibility.)

I tried adding -C link-dead-code, by the way, and I still got the "Truncated" error. I kind of thought that would fix it, but it didn't.

davidhewitt commented 2 years ago

Just wanted to say thank you for working on coverage in general @richkadel @wesleywiser - the quality of coverage measurements in PyO3 keep getting better and better, and I'm looking forward to having -Z instrument-coverage on stable one day 😊

kgrech commented 2 years ago

I can confirm that I am also able to see the same issue using latest nightly toolchain

mimoo commented 2 years ago

hitting that issue too o.o

wesleywiser commented 2 years ago

93144 was just merged a few hours ago and should resolve these issues. It should be available in the next nightly toolchain release. If you run into further issues, please open a new issue so we can make sure it gets resolved appropriately. Thanks!