rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.86k stars 12.51k forks source link

Malformed coverage data when using llvm-cov #119453

Open StackOverflowExcept1on opened 8 months ago

StackOverflowExcept1on commented 8 months ago

I tried to fuzz our substrate runtime but getting error:

error: Failed to load coverage: 'target/x86_64-unknown-linux-gnu/coverage/x86_64-unknown-linux-gnu/release/main': Malformed coverage data
How to reproduce (on our large repo) ```bash git clone --branch av/rust-1.76-support https://github.com/gear-tech/gear.git cd gear git checkout 849dbb301c751c951754b73b39a50a02e7296bef cd utils/runtime-fuzzer mkdir -p fuzz/corpus/main dd if=/dev/urandom of=fuzz/corpus/main/fuzzer-seed-corpus bs=1 count=350000 # Run fuzzer for at least for 3 minutes and then press Ctrl-C to stop fuzzing. cargo fuzz run \ --release \ --sanitizer=none \ main \ fuzz/corpus/main \ -- \ -rss_limit_mb=8192 \ -max_len=450000 \ -len_control=0 cargo fuzz coverage \ --release \ --sanitizer=none \ main \ fuzz/corpus/main \ -- \ -rss_limit_mb=8192 \ -max_len=450000 \ -len_control=0 HOST_TARGET=$(rustc -Vv | grep "host: " | sed "s/^host: \(.*\)$/\1/") cargo cov -- show target/$HOST_TARGET/coverage/$HOST_TARGET/release/main \ --format=text \ --show-line-counts \ --Xdemangler=rustfilt \ --instr-profile=fuzz/coverage/main/coverage.profdata \ --ignore-filename-regex=/rustc/ \ --ignore-filename-regex=.cargo/ &> fuzz/coverage/main/coverage.txt ```

Meta

rustc --version --verbose:

rustc 1.77.0-nightly (3cdd004e5 2023-12-29)
binary: rustc
commit-hash: 3cdd004e55c869faa2b7b25efd3becf50346e7d6
commit-date: 2023-12-29
host: x86_64-unknown-linux-gnu
release: 1.77.0-nightly
LLVM version: 17.0.6
StackOverflowExcept1on commented 8 months ago

@Zalathar I don't know how to reproduce this with a minimal example, but would appreciate it if you could somehow debug this in LLVM code. As you wrote earlier this comes from llvm-cov. It throws coveragemap_error::malformed somewhere but without any backtrace.

StackOverflowExcept1on commented 8 months ago

I debugged LLVM and this comes from the same place: https://github.com/rust-lang/llvm-project/blob/fef3d7b14ede45d051dc688aae0bb8c8b02a0566/llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp#L340-L344 But I have no idea how to get source file by ExpandedFileID to provide more info.

Zalathar commented 8 months ago

Pinpointing the LLVM error was very helpful, thanks. Without that I wouldn’t be able to do much.

Can you give me an example of the LineStart/ColumnStart/NumLines/ColumnEnd values that are present when it fails? That should give me a better idea of what’s going wrong on the Rust side.

Zalathar commented 8 months ago

(I might not be able to track down the underlying issue, but hopefully I should at least be able to add some extra checks to the compiler to make sure LLVM doesn’t encounter a fatal error.)

StackOverflowExcept1on commented 8 months ago

@Zalathar

LineStart=652 LineStartDelta=0 ColumnStart=9 NumLines=0 ColumnEnd=10
Counter in file 0 652:9 -> 652:10, #44
LineStart=1112 LineStartDelta=460 ColumnStart=1 NumLines=4294966836 ColumnEnd=10
Counter in file 0 1112:1 -> 4294967948:10, #0
here 1112, 1652, 10
error: Failed to load coverage: 'target/x86_64-unknown-linux-gnu/coverage/x86_64-unknown-linux-gnu/release/main': malformed coverage data
#10
Zalathar commented 8 months ago

I've submitted a possible workaround in #119460.

It doesn't address the underlying question of why we were producing improper regions in the first place, but it does at least mean that we will detect and discard those regions early, instead of emitting them and having llvm-cov fail.

Zalathar commented 8 months ago

NumLines=4294966836 is 0xFFFF_FE34, which is -460 (relative to LineStart=1112).

This suggests that the original coordinates were 1112:1 -> 652:10.

Zalathar commented 8 months ago

Spans within the compiler are always properly-ordered in terms of byte positions (enforced in Span::new), so I believe the only way to end up with improper line/column coordinates is if the span starts and ends in different files.

I'm not sure how we're ending up with a span like that. I initially suspected fn_sig_span, but it turns out that we require fn_sig_span and body_span to start in the same file, so that probably isn't the cause.

Zalathar commented 8 months ago

I wonder if the span adjustment in filtered_terminator_span for TerminatorKind::Call could be responsible, as it combines the endpoints of two potentially-unrelated spans.

Though if the resulting span crosses file boundaries, I would expect it to be discarded by unexpand_into_body_span, unless the body span also crosses file boundaries. But we don't do any ad-hoc manipulation of the endpoints of the body span itself, so the file-crossing span would have to already be present in MIR before coverage gets involved. I don't know whether that's possible or not.

StackOverflowExcept1on commented 8 months ago

@Zalathar Is there a way to find out which source file is causing the problem? I think it's pretty easy to edit the llvm-cov source code on the fly and then put it to ~/.rustup/toolchains/<toolchain>/lib/rustlib/...

Zalathar commented 8 months ago

If you look at method RawCoverageMappingReader::readMappingRegionsSubArray, you should see a parameter unsigned InferredFileID.

That file ID should be an index into the field std::vector<StringRef> &Filenames in RawCoverageMappingReader.

So you might be able to rig up something to print out that filename.

StackOverflowExcept1on commented 8 months ago

The problem comes from the macro construct_runtime!(): https://github.com/gear-tech/gear/blob/av/rust-1.76-support/runtime/vara/src/lib.rs#L1112

Counter in file 0 1112:1 -> 4294967948:10, #0
/home/.../work/gear/runtime/vara/src/lib.rs
LineStart=1112 LineStartDelta=460 ColumnStart=1 NumLines=4294966836 ColumnEnd=10
error: Failed to load coverage: 'target/x86_64-unknown-linux-gnu/coverage/x86_64-unknown-linux-gnu/release/main': malformed coverage data
Zalathar commented 8 months ago

That looks like the source of the 1112:1, but it leaves me puzzled as to where the 652:10 is coming from.

I don’t think those coordinates point anywhere meaningful in the same file, but I’m not sure what other file they could be trying to refer to.

StackOverflowExcept1on commented 8 months ago

I don’t know what the hell is going on at 652:9 -> 652:10, but I can attach the full log: coverage.txt

Zalathar commented 8 months ago

Another piece of information that might be useful is the name of the function that has the malformed region in its coverage mappings.

It looks like you should be able to dump it from R.FunctionName in BinaryCoverageReader::readNextRecord.

But you might need to dump it before the call to Reader.read(), because I believe that's the call that fails when it encounters the bad region.

atodorov commented 6 months ago

FTR I was seeing the same issue when running cargo llvm-cov with another Substrate based chain. rustc 1.77.0-nightly fixed it for me. Will wait for this to become stable and upgrade.

@Zalathar thanks for your patch!