rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.92k stars 12.53k forks source link

Adding a timout command when running the executable file could cause wrong coverage result #113088

Open cicilzx opened 1 year ago

cicilzx commented 1 year ago

I'm not sure it is a bug or my incorrect behaviour. Here is what I did:

  1. A simple Rust code:

    fn main() {}
  2. Build, run, and get coverage information:

    rustc -C instrument-coverage file.rs
    ./file
    grcov . --binary-path ./file -s ./file.rs -t lcov --branch --ignore-not-existing --ignore '../*' --ignore "/*" -o ./lcov1.info

    The rusult goes well, like this:

    TN:
    SF:
    FN:1,file10::main
    FNDA:1,file10::main
    FNF:1
    FNH:1
    BRF:0
    BRH:0
    DA:1,1
    DA:2,1
    DA:3,1
    LF:3
    LH:3
    end_of_record
  3. But if I add a timeout command like this:

    rustc -C instrument-coverage file.rs
    timeout 30m ./file
    grcov . --binary-path ./file -s ./file.rs -t lcov --branch --ignore-not-existing --ignore '../*' --ignore "/*" -o ./lcov2.info

    The coverage result is output like this:

    TN:
    SF:
    FN:1,file10::main
    FNDA:1,file10::main
    FNF:1
    FNH:1
    BRF:0
    BRH:0
    DA:1,1
    DA:2,1
    DA:3,1
    LF:3
    LH:3
    end_of_record
    SF:
    FN:1,file10::main
    FNDA:1,file10::main
    FNF:1
    FNH:1
    BRF:0
    BRH:0
    DA:1,1
    DA:2,1
    DA:3,1
    LF:3
    LH:3
    end_of_record

    I don't understand why there are two end_of_record, and the second group of coverage infomation seems to be odd. Is this a bug of rustc or grcov? Or is there something wrong with the way I'm doing it that causes the results to be incorrect?

Noratrieb commented 1 year ago

cc @Zalathar you've been playing with coverage recently, maybe you have an idea what's happening here?

Zalathar commented 1 year ago

I haven't used grcov, so that part is over my head.

In terms of simple things to try first:

Zalathar commented 1 year ago

If those don't help, it might be worth trying to reproduce the issue with LLVM's coverage reporting tools instead of grcov, processing the same .profraw files that did/didn't produce problems.

You'll need the llvm-tools-preview component from rustup, which includes the llvm-profdata and llvm-cov commands. They're a bit fiddly to use by hand, but you might find this useful as a reference for some of the command-line arguments you'll want to use:

https://github.com/rust-lang/rust/blob/6162f6f12339aa81fe16b8a64644ead497e411b2/src/tools/compiletest/src/runtest.rs#L512-L537

(This is part of Rust's test suite so it's executing the commands from Rust code; ignore that and focus on the command-line arguments.)

cicilzx commented 1 year ago

I haven't used grcov, so that part is over my head.

In terms of simple things to try first:

  • Make sure you delete any .profraw files (and any intermediate files produced by grcov) between different runs, so they don't accidentally interfere with each other, or get accidentally merged by your analysis tools.
  • If this is on stable rustc, try beta or nightly. There has been a recent change to make coverage counters atomic, which could plausibly help here.
  • Per https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program, try setting an explicit LLVM_PROFILE_FILE environment variable that includes the %c flag, which tells the profiler runtime to continuously sync its counters to a file (instead of writing them out with an exit hook). Again, this could plausibly help with problems caused by timeout killing your process.

Thanks a lot! I tried the nightly and stable versions and found that if both the .profraw file and the compiled binary are removed, then the calculated coverage is correct, otherwise the problem I mentioned will occur. Only remove the .profraw file is not enough, and the ./file has to be removed, either. If I process like this, there is no bug:

rustc -C instrument-coverage file.rs
./file
./file # (OR timeout 30m ./file), and do not need to remove llvm-profdata
grcov . --binary-path ./file -s ./file.rs -t lcov --branch --ignore-not-existing --ignore '../*' --ignore "/*" -o ./lcov.info

But if I process like this, the coverage file is not right:

rustc -C instrument-coverage file.rs
./file
grcov . --binary-path ./file -s ./file.rs -t lcov --branch --ignore-not-existing --ignore '../*' --ignore "/*" -o ./lcov1.info
# remove llvm-profdata and keep ./file
timeout 30m ./file
grcov . --binary-path ./file -s ./file.rs -t lcov --branch --ignore-not-existing --ignore '../*' --ignore "/*" -o ./lcov2.info

Maybe this is caused by grcov?

Zalathar commented 1 year ago

From that description, it sounds like the problem is more likely to be on the grcov side, or in how grcov is being used.

It seems strange to me that deleting the executable would help. When processing coverage with LLVM tools, you typically need the original executable, because it contains embedded metadata that is needed when mapping the raw counters in .profraw back to regions in the source code.