--follow-exec causes difference in execution

boustrophedon commented 2 years ago

Describe the bug In some cases it appears using --follow-exec can cause test code to not run somehow.

To Reproduce https://github.com/boustrophedon/tarpaulin_missing_coverage/commits/master

The original problem I'm trying to solve is the following: I have tests in tests/ and examples in examples/ and I want to get combined coverage for all of them. If I just run cargo tarpaulin --tests --examples, the examples don't execute because the test runner is used to look for tests rather than execute the examples (related?). I've worked around this by adding a small test in each example that just calls main.

However, one of my example programs calls itself (which, as above, is actually the test runner process) as a subprocess and although it appears the processes are executing (e.g. if I intentionally throw a panic in one, the panic shows up), tarpaulin isn't catching that.

While investigating this I've found a minimal example that shows just by adding and removing the --follow-exec flag that the subprocess code isn't being called somehow.

Expected behavior The subprocess code should run, the file should be written to, and the CI step "Check file was actually created" should pass.

fails: https://github.com/boustrophedon/tarpaulin_missing_coverage/commit/9d91fb202ae08b575528e6aa13c3404e954cd0c9 succeeds: https://github.com/boustrophedon/tarpaulin_missing_coverage/commit/4e92bfe238d5391eac58f3c562b8316796cb4655

xd009642 commented 2 years ago

so for the first part, cargo tarpaulin --examples is equivalent to cargo test --examples which is because of the issue you linked. If you want to avoid creating example tests you can run the examples directly with cargo tarpaulin --command Build --examples and use a config file to combine that with other test types. Something as follows should work:

[examples]
command = "Build"
run-types = ["Examples"]

[tests]
run-types =["tests"] # Just put the others here - maybe an empty section will work but I haven't tried

I have a feeling this may be related to some things I've been seeing with https://github.com/xd009642/tarpaulin/issues/953 so will try the examples and see if i can puzzle it out

boustrophedon commented 2 years ago

Thanks for the tip on using the run-types = Examples in a config file!

Let me know if there's anything I can do to help with understanding the weird behavior of --follow-exec.

xd009642 commented 2 years ago

So I tried this out and it fixes your example https://github.com/xd009642/tarpaulin/pull/962 if you want to try it on your own project :eyes:

boustrophedon commented 2 years ago

https://github.com/boustrophedon/tarpaulin_missing_coverage/runs/5400464385

Here's what I'm getting when using the issue/process-kill-953 branch


running 1 test
test call_main ... 
running 1 test
test call_main ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Error: "Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response"
Mar 03 00:43:07.529 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response
Error: Process completed with exit code 1.

boustrophedon commented 2 years ago

Oh, I should change the code to remove the test harness and see if that does anything though.

boustrophedon commented 2 years ago

nope, same failure https://github.com/boustrophedon/tarpaulin_missing_coverage/runs/5400595536

xd009642 commented 2 years ago

Interesting, I tried it on 4e92bfe238d5391 and got 100% coverage :thinking:

xd009642 commented 2 years ago

Oh it only times out using the config for me so must be another issue in the config stuff :eyes: fun

xd009642 commented 2 years ago

It was me being a dummy, follow-exec wasn't aliased in serde so it expected follow_exec, I've added the alias in now though, it does time out without follow-exec though...

xd009642 commented 2 years ago

Okay, If you try now it works with and without the config file! Finally satisfied I've laid this one to rest. And I think this may have done enough to unblock me on #953 so thanks :grin:

boustrophedon commented 2 years ago

2022-03-03-18:38:31 https://github.com/boustrophedon/tarpaulin_missing_coverage/runs/5415282826?check_suite_focus=true

It worked!

Thanks so much for fixing this!

xd009642 commented 2 years ago

Brilliant, I'll be tweaking the branch a bit more to try and fix the issue it was originally there to fix. But once it's merged in I'll cut a new release :+1:

boustrophedon commented 2 years ago

Awesome. Should I close this issue or do you want to do it when you do the release?

xd009642 commented 2 years ago

I'll do it when the PR's merged, that way you'll know the release is coming imminently and can switch your CI to use the latest release :+1:

xd009642 commented 2 years ago

Just a reminder in case I don't solve this change in behaviour, it may become necessary to add an explicit wait() to that example test case to stop tarpaulin continuing on the exec killing the test process and then not capturing the exec'd coverage. But I should be able to solve that issue as well.

It's just the follow-exec issues recently are churning up a lot of the behaviours :sweat_smile: . Hopefully, it will be a lot simpler (and faster to execute) after this though

xd009642 commented 2 years ago

Release 0.20.0 will be out once CI finishes with the fix for this issue :+1:

boustrophedon commented 2 years ago

I'm getting segfaults when I upgrade to 0.2 :(

https://github.com/boustrophedon/extrasafe/runs/5638757347

but the minimal example builds and runs: https://github.com/boustrophedon/tarpaulin_missing_coverage/actions/runs/2020270268 https://coveralls.io/github/boustrophedon/tarpaulin_missing_coverage

xd009642 commented 2 years ago

A quick check are you setting --test-threads in any way? I found there was a weird spurious one until I set it to 1 and then it disappeared completely.

boustrophedon commented 2 years ago

It's just calling cargo tarpaulin: https://github.com/boustrophedon/extrasafe/blob/ipc_coverage-rebase/.github/workflows/build-test.yaml#L60

I wonder if the examples are also being run with the equivalent of test-threads 1? https://github.com/boustrophedon/extrasafe/blob/ipc_coverage-rebase/.tarpaulin.toml#L3

boustrophedon commented 2 years ago

Actually it looks like this failure is probably on my side. Out of curiosity, is tarpaulin using signals in some way to catch syscalls or something?

xd009642 commented 2 years ago

Not explicitly, ptrace does catch all the signals, but tarpaulin will forward ones that it doesn't think it has use for back to the test i.e. SIGCHLD to identify a spawned processed has finished so that spawned commands can be waited on

boustrophedon commented 2 years ago

So unfortunately what's happening is that test-threads=1 actually just breaks all my tests. This is because my library is a wrapper around seccomp, which allows you to tell the kernel to deny the usage of syscalls of your choosing, for security reasons. Seccomp filters are applied to the current thread (and are inherited by child threads which isn't relevant here), so when you run two tests with different filters, the intersection of the two filters happens.

In particular, what's happening is that one test is only allowing filesystem operations, another one is allowing only network operations, and so when they run sequentially on the same thread, the end result is that neither is allowed, and the test fails.

When test-threads=1, the rust test runner says "we don't need to spawn new threads, just use the current one" (see here and run_test/run_test_inner later in the same file).

boustrophedon commented 2 years ago

With RUST_TEST_THREADS=2 and just running the tests, it doesn't have the issue.

However with RUST_TEST_THREADS=2 and running the examples as well, it segfaults on the subprocess example.

xd009642 commented 2 years ago

Okay looks like for you tests I have to fix that test thread > 1 segfault that happens like 4% of the time on my machine and >99% of the time on CI 😢.

I did find the segfault only happened on nightly not stable so maybe running on stable for coverage could be a stop gap solution?

boustrophedon commented 2 years ago

I'm not selecting nightly anywhere - does tarpaulin select nightly internally somewhere? Per the github CI documentation it's running rust 1.59.

I'll try adding an explicit .wait() at the subprocess call and see if that fixes it.

xd009642 commented 2 years ago

No it doesn't, I just saw the segfault in CI only on nightly for my own tests, maybe yours just exercises the issue stronger so it appears on stable :thinking:

boustrophedon commented 2 years ago

Actually I just checked and I have explicit kill calls.

What's happening is that I have:

The main process
A "db server" process
A "web server" process and then two client processes that make network calls to the webserver

First I start the db server in a subprocess (without wait ing since it's a server), and per the CI log the db process gets to the point where it's waiting

Then we sleep for 100 ms (which could be what's making the issue occur every time) so that we're sure the server's ready.

Then we try to start the webserver but we never actually get there because the first line of it is a println that doesn't show up.

So either the sleep is causing the issue or the second subprocess call itself. In particular, the subprocess call is calling the same executable as the first, /proc/self/exe, just with different arguments.

orium commented 2 years ago

I have the same problem in orium/cargo-rdme:

$ cargo tarpaulin --follow-exec
⋮
Jun 14 10:57:52.890  INFO cargo_tarpaulin::process_handling::linux: Launching test
Jun 14 10:57:52.890  INFO cargo_tarpaulin::process_handling: running /home/orium/programming/projects/cargo-rdme/target/debug/deps/tests-e4918e5f44a98741

running 44 tests
test system_test_avoid_overwrite_uncommitted_readme ... ok
test system_test_custom_lib_path ... Jun 14 10:57:57.976 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests
Error: "Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests"

Setting the number of threads to 1 doesn't seem to make a difference.

This happens with tarpaulin 0.20.1.

xd009642 / tarpaulin

--follow-exec causes difference in execution #966