nextest timeouts don't work because cargo-miri doesn't propagate signals?

saethlin commented 2 years ago

cc @sunshowers

If I try to run cargo miri nextest --no-fail-fast with a long-running test that should be timed out, the Miri process executing the test doesn't exit, only the cargo-miri process that nextest is managing exits. I think this is because cargo-miri isn't passing along the SIGTERM to its child.

That all sort of makes sense to me but I feel like I'm losing my mind because normal cargo miri test doesn't have this problem. If I kill any of the cargo-miri processes it takes down the Miri process as well. I've probably spent more time than I should trying to understand this, I think I'm just missing the Unix knowledge and understanding of how nextest works.

sunshowers commented 2 years ago

Huh that's weird! How are you configuring the timeout? Are you using nextest's support for timeouts?

Nextest sends SIGTERM to the child process (test), waits 10 seconds, then sends SIGKILL. This should be foolproof.

saethlin commented 2 years ago

Configured per your suggestion with --tool-config-file. Yup, nextest sends SIGTERM to a cargo-miri process which immediately exits, but leaves its child miri process just cranking away.

sunshowers commented 2 years ago

Ahh that makes sense. Killing grandchild processes is a bit of a nightmare especially on Unix, so nextest doesn't try and do that at the moment (though that may change in the future).

The cargo miri process run under nextest should probably forward the SIGTERM (and honestly do a SIGKILL after a couple of seconds) to the miri process.

See this link:

https://github.com/oconnor663/duct.py/blob/master/gotchas.md#killing-grandchild-processes

RalfJung commented 2 years ago

I guess the problem is that cargo-miri inserts itself between cargo (or, in this case, cargo-nextest) and the processes it spawns, to be able to adjust the flags. In an ideal world this would actually use POSIX exec, and then the final process graph would look like it does without cargo-miri. But I don't think the Rust standard library exposes a plain exec.

That all sort of makes sense to me but I feel like I'm losing my mind because normal cargo miri test doesn't have this problem. If I kill any of the cargo-miri processes it takes down the Miri process as well. I've probably spent more time than I should trying to understand this, I think I'm just missing the Unix knowledge and understanding of how nextest works.

Hm, that seems surprising? If you do Ctrl-C in the shell, I think the shell does some magic and sends SIGINT to a whole load of processes (all processes attached to this terminal, or something like that?), and that's what makes it work. But a single targeted signal to cargo-miri should behave the same whether you send it by hand or whether cargo-nextest sends it.

saethlin commented 2 years ago

As usual, you're right. Don't know what I was doing before, but if I cargo miri test then pkill cargo-miri, it definitely leaves the miri process running. So at least the bad behavior is consistent, maybe I was just tired.

sunshowers commented 2 years ago

On the nextest side I'm going to try setting up a process group (looks like Bazel does this for tests).

sunshowers commented 2 years ago

In https://github.com/nextest-rs/nextest/pull/393 I've switched nextest over to using process groups on Unix, which should address this on the nextest end.

I'm also going to do a similar patch on Windows using job handles.

sunshowers commented 2 years ago

https://github.com/nextest-rs/nextest/pull/396 is the fix for Windows. Should aim to get a new release out tomorrow.

sunshowers commented 2 years ago

FWIW this is addressed on nextest's end now, so the change in #2426 (while probably good in case the cargo-miri process gets a SIGTERM from some other source) isn't necessary for nextest.

saethlin commented 2 years ago

Yeah, I still definitely want the exec change because it doesn't make it seem like something is broken when you hit ctrl+c, and it also makes the experience looking at top or pgrep much more familiar.

saethlin commented 2 years ago

However, this issue has been addressed in nextest :) As-written, this is closed.

rust-lang / miri

nextest timeouts don't work because cargo-miri doesn't propagate signals? #2421