Closed lionel- closed 1 month ago
The windows tests are failing because on Windows we loop over processes with ps_kill()
. The create-time test creates a process handle with a bumped create-time to simulate a PID reuse and on Windows we get:
Error: No such process, pid 1044, ???
Is this expected behaviour? It seems a bit dangerous to throw an error just because a process might have been killed or terminated already. Considering the races involved and that ps_kill()
will mostly be called in cleanup context, it seems that this makes it hard to use.
Maybe we don't need to make it interruptible? After all the grace period will be typically small, and people can wait a couple of seconds.
But of course the main issue is the SIGCHLD. I wonder if we can do better for that nowadays. Probably not, but let me look around a bit. AFAIR there are some macOS specific and/or Linux specific APIs that will let you poll for the termination of multiple processes.
Seems like we can use pidfd_open()
on Linux, and macOS has kqueue, which can poll for process termination.
pidfd_open()
works great, it is a pity that it needs a Linux kernel 5.3 or later. RHEL 8.10 has 4.18, and it is supported until 2029 (!), so we'll clearly need a fallback, using the self-pipe and the SIGCHLD handler.
Debian 10 also has 4.x, but it is EOL June 2024, so that is OK.
OK, with the new ps_wait()
this is going to be much simpler, so I am going to close this and create another PR.
I am sorry again for the long wait, and thank you for the PR! I know it was a lot of work to implement it, but we'll have a better alternative now that (at last!) we can poll non-child processes on all three platforms.
Kills a list of process handles in parallel with a grace period.
First a
SIGTERM
is issued to allow processes to gracefully terminate if they can.We listen for termination events of immediate subprocesses via a
SIGCHLD
handler that wakes up a bottom half with a one-byte write to a self-pipe. This mostly follows what processx is doing for thewait()
method. The self-pipe polling is reguarly interrupted to check for user interrupts and we take this opportunity to check for termination of subsubprocesses for which we don't get notified on termination.This means that if the list only contains immediate subprocesses we are able to return quicker than if it contains indirect subprocesses.
When the grace period is up and some processes are still running, those are sent
SIGKILL
.The creation time of the process is consistently checked to avoid killing reused PIDs.
Next steps:
Use in
ps_kill_tree()
to add graceful termination to processxkill_tree()
methods. I suggest replacing thesig
argument bygrace
, and add a separateps_signal_tree()
for sending custom signals without the grace mechanism. A CRAN org search shows that thesig
argument is not used on CRAN.Use in processx session finalizers to sigterm all processes in parallel on session quit.