Closed saethlin closed 2 years ago
Huh that's weird! How are you configuring the timeout? Are you using nextest's support for timeouts?
Nextest sends SIGTERM to the child process (test), waits 10 seconds, then sends SIGKILL. This should be foolproof.
Configured per your suggestion with --tool-config-file
. Yup, nextest sends SIGTERM to a cargo-miri
process which immediately exits, but leaves its child miri
process just cranking away.
Ahh that makes sense. Killing grandchild processes is a bit of a nightmare especially on Unix, so nextest doesn't try and do that at the moment (though that may change in the future).
The cargo miri process run under nextest should probably forward the SIGTERM (and honestly do a SIGKILL after a couple of seconds) to the miri process.
See this link:
https://github.com/oconnor663/duct.py/blob/master/gotchas.md#killing-grandchild-processes
I guess the problem is that cargo-miri inserts itself between cargo (or, in this case, cargo-nextest) and the processes it spawns, to be able to adjust the flags. In an ideal world this would actually use POSIX exec
, and then the final process graph would look like it does without cargo-miri. But I don't think the Rust standard library exposes a plain exec
.
That all sort of makes sense to me but I feel like I'm losing my mind because normal cargo miri test doesn't have this problem. If I kill any of the cargo-miri processes it takes down the Miri process as well. I've probably spent more time than I should trying to understand this, I think I'm just missing the Unix knowledge and understanding of how nextest works.
Hm, that seems surprising? If you do Ctrl-C in the shell, I think the shell does some magic and sends SIGINT to a whole load of processes (all processes attached to this terminal, or something like that?), and that's what makes it work. But a single targeted signal to cargo-miri
should behave the same whether you send it by hand or whether cargo-nextest sends it.
As usual, you're right. Don't know what I was doing before, but if I cargo miri test
then pkill cargo-miri
, it definitely leaves the miri
process running. So at least the bad behavior is consistent, maybe I was just tired.
On the nextest side I'm going to try setting up a process group (looks like Bazel does this for tests).
In https://github.com/nextest-rs/nextest/pull/393 I've switched nextest over to using process groups on Unix, which should address this on the nextest end.
I'm also going to do a similar patch on Windows using job handles.
https://github.com/nextest-rs/nextest/pull/396 is the fix for Windows. Should aim to get a new release out tomorrow.
FWIW this is addressed on nextest's end now, so the change in #2426 (while probably good in case the cargo-miri process gets a SIGTERM from some other source) isn't necessary for nextest.
Yeah, I still definitely want the exec
change because it doesn't make it seem like something is broken when you hit ctrl+c
, and it also makes the experience looking at top
or pgrep
much more familiar.
However, this issue has been addressed in nextest :) As-written, this is closed.
cc @sunshowers
If I try to run
cargo miri nextest --no-fail-fast
with a long-running test that should be timed out, the Miri process executing the test doesn't exit, only the cargo-miri process that nextest is managing exits. I think this is because cargo-miri isn't passing along the SIGTERM to its child.That all sort of makes sense to me but I feel like I'm losing my mind because normal
cargo miri test
doesn't have this problem. If Ikill
any of the cargo-miri processes it takes down the Miri process as well. I've probably spent more time than I should trying to understand this, I think I'm just missing the Unix knowledge and understanding of how nextest works.