rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.98k stars 12.68k forks source link

Malformed coverage data with a `no_mangle` function #117788

Closed fsavy-tehtris closed 10 months ago

fsavy-tehtris commented 11 months ago

Code

I tried to run cargo llvm-cov on this code:

use abi_stable::utils::{ffi_panic_message, PanicInfo};

#[no_mangle]
pub extern "C" fn lib_get_module() {
    ffi_panic_message(&PanicInfo {
        file: "abc",
        line: 0,
    });
}

I expected the tool to work normally, but instead, this happened:

error: Failed to load coverage: 'target/llvm-cov-target/debug/deps/cov_bug_repro-3c4d51dd97237a75': Malformed coverage data

Please note this is the most minimal reproducible example I've been able to find.

Removing either the no_mangle attribute or the ffi_panic_message call does not reproduce the error, but the extern "C" has not effect. I've tried to remove the abi_stable dependency but putting the required code in the same file or in a different crate (in the same workspace) does not reproduce the error.

Associated abi_stable code

pub mod utils {
    /// Information about a panic, used in `ffi_panic_message`.
    #[derive(Debug, Copy, Clone)]
    pub struct PanicInfo {
        ///
        pub file: &'static str,
        ///
        pub line: u32,
    }

    /// Prints an error message for attempting to panic across the
    /// ffi boundary and aborts the process.
    #[inline(never)]
    #[cold]
    pub fn ffi_panic_message(info: &'static PanicInfo) -> ! {
        eprintln!("\nfile:{}\nline:{}", info.file, info.line);
        eprintln!("Attempted to panic across the ffi boundary.");
        eprintln!("Aborting to handle the panic...\n");
        std::process::exit(1);
    }
}

Version it worked on

It most recently worked on: 1.73.0

Version with regression

rustc --version --verbose:

rustc 1.74.0-beta.5 (efc300e54 2023-11-07)
binary: rustc
commit-hash: efc300e5460fd1ed057b882e9e29adfdd217eeef
commit-date: 2023-11-07
host: x86_64-unknown-linux-gnu
release: 1.74.0-beta.5
LLVM version: 17.0.4

@rustbot modify labels: +regression-from-stable-to-beta -regression-untriaged

lqd commented 11 months ago

cc @Zalathar

Zalathar commented 11 months ago

I was able to reproduce this on beta+nightly on my machine (macOS aarch64), so I should be able to dig into the mappings and see what LLVM is complaining about.

(Thanks for the detailed report!)

Zalathar commented 11 months ago

This seems to be where the Malformed coverage data error is coming from:

https://github.com/rust-lang/llvm-project/blob/fef3d7b14ede45d051dc688aae0bb8c8b02a0566/llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp#L340-L344

Zalathar commented 11 months ago

When the error occurs in the above code, we have startLoc = (159, 59) and endLoc = (1, 4), causing the check to fail.

That seems to correspond to this code:

https://github.com/rodrimati1992/abi_stable_crates/blob/f2485e20058294ef14d414d391d63ddb2a99ea69/abi_stable/src/macros/internal.rs#L137-L161

Zalathar commented 11 months ago

If I print out the raw values being read by readMappingRegionsSubArray, I see:

[RAW] LineStartDelta = 159; ColumnStart = 59; NumLines = 4294967138; ColumnEnd = 4

That NumLines value is 0xFFFFFF62, which is clearly bogus. Probably the result of integer overflow somewhere.

Zalathar commented 11 months ago

Actually I've traced the bogus (1, 4) coordinates to the actual span processed by make_code_region.

It looks like maybe_push_macro_name_span is producing a bogus span that extends into an adjacent file, completely trashing the span coordinates.

At this point I'm guessing the culprit is #116754.

Zalathar commented 11 months ago

Hmm, but I can reproduce this on nightly-2023-10-15, which predates the merge of #116754, so I should look at changes earlier than that.

Zalathar commented 11 months ago

Manually bisected this down to 2023-09-05 (good) to 2023-09-06 (bad).

Zalathar commented 11 months ago

After some more manual bisection, I'm pretty sure this started failing after #115507. That seems consistent with it being a latent coverage bug that has now been made more common by compiler-wide changes to spans.

Zalathar commented 11 months ago

Even in prior revisions I can observe the spans created by check_invoked_macro_name_span sometimes being bogus; they're just bogus in a way that tends not to result in malformed mappings and llvm-cov failures.

lqd commented 11 months ago

(In this situation, cargo-bisect-rustc should work fine to avoid bisecting manually)

Zalathar commented 11 months ago

The fix for this has been cherry-picked into the upcoming stable release of 1.74, and has also been merged on nightly (1.76).

It will probably still be broken in the initial version of the upcoming beta release (1.75); ideally the fix will also be be accepted on the beta branch at some point in the next 5 weeks.