xd009642 / tarpaulin

A code coverage tool for Rust projects
https://crates.io/crates/cargo-tarpaulin
Apache License 2.0
2.5k stars 180 forks source link

Tarpaulin v0.17.0 sometimes errors with a message about a segfault when trying to build against musl #618

Closed cptpcrd closed 3 years ago

cptpcrd commented 3 years ago

Steps to reproduce

I've managed to work my code down to this minimal example:

(Clarification edit: As the title indicates, I have only encountered this when trying to build/test against musl libc; the same programs build fine when targetting glibc.)

rustup target install x86_64-unknown-linux-musl  # Or however you need to do it for your setup

cd /tmp
cargo new test-tarpaulin
cd test-tarpaulin

cat <<EOF >src/main.rs
fn a()  {
    if unsafe { libc::getpid() } < 0 {
        panic!("{}", std::io::Error::last_os_error());
    }
}
EOF
echo 'libc = "0.2"' >>Cargo.toml

cargo tarpaulin --target=x86_64-unknown-linux-musl --verbose

What should happen

tarpaulin builds the target and runs successfully. (It worked with tarpaulin v0.16.0.)

What actually happens

tarpaulin fails with a message about a segfault.

Nov 11 16:45:29.104 DEBUG cargo_tarpaulin: set up logging
Nov 11 16:45:29.104  INFO cargo_tarpaulin::config: Creating config
Nov 11 16:45:29.118  INFO cargo_tarpaulin: Running Tarpaulin
Nov 11 16:45:29.118  INFO cargo_tarpaulin: Building project
   Compiling bitflags v1.2.1
   Compiling libc v0.2.80
   Compiling test-tarpaulin v0.1.0 (/tmp/test-tarpaulin)
    Finished test [unoptimized + debuginfo] target(s) in 1.08s
Nov 11 16:45:30.257  INFO cargo_tarpaulin: Launching test
Nov 11 16:45:30.257  INFO cargo_tarpaulin: running /tmp/test-tarpaulin/target/x86_64-unknown-linux-musl/debug/deps/test_tarpaulin-86c594765747da0f
Nov 11 16:45:30.534 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests
Error: "Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests

I'm not familiar with tarpaulin's internals (not by a long shot :-), so I can't debug this any further.

Environment

I've been able to reproduce this on Arch Linux, Void Linux, and Ubuntu (all three had rustc 1.47.0 and were running either the latest kernel from the repos or only a few revisions behind).

Potential cause

It looks like this regression was caused (at least, most directly) by #613. RUSTFLAGS='-C relocation-model=dynamic-no-pic' cargo tarpaulin --target=x86_64-unknown-linux-musl --verbose builds with no problems.

cptpcrd commented 3 years ago

Note: The bug referenced in ZcashFoundation/zebra#1283 is slightly different (but possibly related). It occurs when building against glibc, and the error message is different.

xiye520 commented 3 years ago

I also have the same problem. In the previous version(0.16.0), I used the following commands to successfully execute and generate unit test coverage:

 cargo tarpaulin -v

but after upgrading to v0.17.0, the following errors began to appear:

Nov 12 16:41:55.984  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 16957
Nov 12 16:41:55.984  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 16954
Nov 12 16:41:55.984  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 16954
Nov 12 16:41:55.984 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests
Error: "Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests"

Can the cargo tarpaulin tool specify a certain version to install?

xd009642 commented 3 years ago

It's interesting that the relocation model needs to be set explicitly I'll try some stuff with musl and see if I can get to the bottom of it... Generally tarpaulin adding linker flags tends to make issues for some projects so it's interesting to see the inverse :grimacing:

@cptpcrd and @xiye518 are your tests launching external processes? I added functionality to follow exec events down if the binary was part of the project so it's likely a problem with that which wasn't caught in testing. I'll have a look at these issues tonight!

@xiye518 cargo install --version 0.16.0 cargo-tarpaulin

MitMaro commented 3 years ago

I've also been seeing similar issues for a couple of days now and have been trying to reduce the segfault to a minimal reproducible example but the segfault seems to "move" when tests are recompiled. I am using glibc.

$ cargo +nightly tarpaulin --verbose --output-dir coverage -- config
Nov 12 09:28:09.195 DEBUG cargo_tarpaulin: set up logging
Nov 12 09:28:09.195  INFO cargo_tarpaulin::config: Creating config
Nov 12 09:28:09.247  INFO cargo_tarpaulin: Running Tarpaulin
Nov 12 09:28:09.247  INFO cargo_tarpaulin: Building project
   Compiling git-interactive-rebase-tool v1.2.1 (/home/mitmaro/code/git-interactive-tool)
    Finished test [unoptimized + debuginfo] target(s) in 6.01s
Nov 12 09:28:15.546  INFO cargo_tarpaulin: Launching test
Nov 12 09:28:15.546  INFO cargo_tarpaulin: running /home/mitmaro/code/git-interactive-tool/target/debug/deps/interactive_rebase_tool-17c651823e96ab64
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x1879d0
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187a10
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b10
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b50
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b90
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187bd0
Nov 12 09:28:21.351 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x28a730
Nov 12 09:28:21.363 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x27b4c0

running 167 tests
test config::tests::config_diff_tab_invalid_range ... ok
test config::tests::config_diff_tab_invalid ... ok
test config::tests::config_diff_tab_symbol_invalid_utf8 ... ok
test config::tests::config_diff_space_symbol_invalid_utf8 ... ok
Nov 12 09:28:21.552  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 1994321
Nov 12 09:28:21.552  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 1994321
Nov 12 09:28:21.554  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 1994321
Nov 12 09:28:21.554  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 1994321
Nov 12 09:28:21.558 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests
Error: "Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests"

I've also seen a SIGILL:

cargo +nightly tarpaulin --verbose --output-dir coverage -- config
Nov 12 09:28:33.491 DEBUG cargo_tarpaulin: set up logging
Nov 12 09:28:33.491  INFO cargo_tarpaulin::config: Creating config
Nov 12 09:28:33.532  INFO cargo_tarpaulin: Running Tarpaulin
Nov 12 09:28:33.532  INFO cargo_tarpaulin: Building project
    Finished test [unoptimized + debuginfo] target(s) in 0.03s
Nov 12 09:28:33.822  INFO cargo_tarpaulin: Launching test
Nov 12 09:28:33.822  INFO cargo_tarpaulin: running /home/mitmaro/code/git-interactive-tool/target/debug/deps/interactive_rebase_tool-17c651823e96ab64
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x1879d0
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187a10
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b10
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b50
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b90
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187bd0
Nov 12 09:28:39.728 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x28a730
Nov 12 09:28:39.744 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x27b4c0

running 167 tests
test config::tests::config_diff_tab_invalid ... ok
test config::tests::config_diff_tab_invalid_range ... ok
test config::tests::config_diff_tab_symbol_invalid_utf8 ... ok
test config::tests::config_diff_space_symbol_invalid_utf8 ... ok
Nov 12 09:28:39.933  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 1994486
Nov 12 09:28:39.933  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 1994486
Nov 12 09:28:39.935  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 1994486
Nov 12 09:28:39.935  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 1994486
Nov 12 09:28:39.940 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: Error running test - SIGILL raised in 1994486
Error: "Failed to get test coverage! Error: Failed to run tests: Error running test - SIGILL raised in 1994486"

Project: https://github.com/MitMaro/git-interactive-rebase-tool

Failing coverage run in GitHub Actions: https://github.com/MitMaro/git-interactive-rebase-tool/runs/1390537880?check_suite_focus=true

cptpcrd commented 3 years ago

@cptpcrd and @xiye518 are your tests launching external processes? I added functionality to follow exec events down if the binary was part of the project so it's likely a problem with that which wasn't caught in testing. I'll have a look at these issues tonight!

The reproducible example I added doesn't have any tests. When linking against musl instead of glibc, the mere presence of a function that does particular things is enough to make it segfault.

@MitMaro and @xiye518 seem to have encountered slightly different versions of this bug.

xd009642 commented 3 years ago

I think there are two bugs here, I musl seems to cause very large address offsets to be reported which I think is leading to mis-instrumentation.

For @MitMaro and @xiye518 I cant map the thread id to a parent pid with the new exec aware tracing. If I stub it so if it fails it's always the root test id it seems to always work... But that may then break tests that trace execs. If I can't find a better solution by the end of the week I'll release a patch doing that, after all having a new feature break some projects that use it is better than breaking existing projects.

MitMaro commented 3 years ago

@xd009642 , I wish I knew more about how this project works to help out. If there is anything that I can do to provide help, please don't hesitate to ping me.

Also, thanks for the amazing project!

xd009642 commented 3 years ago

Okay I've figured out the musl bug, I just need to implement it. In the process memory map it adds vvar and vDSO before the executable which is I guess the memory map info for musl itself. glibc the process is always first which I thought should hold for everything.

I'll fix this and also add a test for musl binaries to avoid regressions.

After sleeping on it I think I have a solution for the other problem I just need to do some experimentation :smile:

xd009642 commented 3 years ago

@cptpcrd your issue should be fixed in develop if you want to try that.

@MitMaro @xiye518 I have a possible solution in the branch fix_unhandled_tids, it works for git-interactive-rebase but it seems to timeout in zebra still (although still performs better). So still investigating it

MitMaro commented 3 years ago

@xd009642 , just gave the branch and try and I can confirm that it works. Thanks for the quick update! Hopefully, the timeout issue isn't too difficult. :)

xiye520 commented 3 years ago

tarpaulin

cargo install --version 0.16.0 cargo-tarpaulin

The command you gave helped me successfully. I have now reduced my test environment version back to 0.16.0 and can now successfully generate coverage information; thank you for this amazing project!

I will continue to pay attention to this project, and when this bug is fixed, I will experience the new version as soon as possible.

djeedai commented 3 years ago

Possibly related to #463 which also appears to be a regression in 0.17.0, and also exhibits similar symptoms.

xd009642 commented 3 years ago

I've just yanked 0.17.0, I figured I'd spend a bit longer on the issues and didn't want to keep peoples CI broken in the meantime

xd009642 commented 3 years ago

So I have a potential fix for everything on the fix_unhandled_tids branch. I'll test it on zcash tomorrow, it just started to pass my minimal repo at midnight here so I didn't want to start running another test that takes a longer time

Ch00k commented 3 years ago

I really hate to be annoying, but is there any progress on this issue? Is there any help needed (testing etc.)? I would really like to be able to see coverage for a project with integration tests only, run against the binary (which was implemented in the yanked 0.17.0, #107).

xd009642 commented 3 years ago

so I've merged a partial fix to the develop branch which adds a --follow-exec flag to the CLI options as I couldn't get it fully working with zebra. I am currently abroad on a business trip so progress has stalled slightly as a result. One way to help is to use tarpaulin from develop with that option and see that it all works for you.

I've got one bug that's also in 0.16.0 to fix that I just need to do a test for and then if follow-exec works for enough people I'll release a new version. I just need to find the time alongside my day job and a bunch of end of year deadlines...

Ch00k commented 3 years ago

The --follow-exec seems to work for me. I'll stick to installing from develop for the time being. Thanks so much! 👍

xd009642 commented 3 years ago

Closing this as only the zebra issue remains which seems to be separate.