Open Firstyear opened 3 years ago
I didn't know that we set a signal handler if you did not directly use the tokio::signal
module.
Signal handlers are lazily installed the first time a Signal
instance is created.
(Note that on Unix platforms tokio's process management will also install a signal handler, in case some other part of the code is using processes)
https://github.com/tokio-rs/tokio/blob/master/tokio/src/runtime/driver.rs#L10
When enable_io is used, process, signal and io handlers are all installed. Unless I'm mistaken this is the what I'm asking, if this could be enabled in a more fine-grained manner.
@Firstyear
https://github.com/tokio-rs/tokio/blob/master/tokio/src/runtime/driver.rs#L10
This is where the driver is registered, but signal handlers are lazily registered here: https://github.com/tokio-rs/tokio/blob/master/tokio/src/runtime/driver.rs#L10
It seems like something in the application is registering a signal one way or another. If the signal driver is disabled, then that part of the code will panic (although it will help you find out where the registration is happening at least).
I'm not very familiar with SA_ONSTACK
, but maybe there is a way we can allow the signal driver to be configured to use it
I found this issue after running into the same problem with a Rust library that's being called from Go. I wrote a little test program and was able to confirm that the signal handler for SIGCHLD is installed eagerly when the process
and signal
features are enabled, even when the program does nothing but sleep, and exit.
Here's the main.rs from the test program:
#[tokio::main]
async fn main() {
tokio::time::sleep(std::time::Duration::from_secs(600)).await;
}
I ran that, and then followed these instructions while the program was sleeping.
With the dependency tokio = {version = "1", features = ["process", "signal", "rt-multi-thread", "time", "macros"]}
, I get the following output from cat /proc/<pid>/status
:
SigBlk: 0000000000000000
SigIgn: 0000000000001000
SigCgt: 0000000180010440
The SigCgt above indicates that it is catching signal 17 (SIGCHLD). When I remove the process
and signal
features, then the output shows SigCgt: 0000000180000440
, indicating that it's not catching SIGCHLD.
I don't know whether that should be considered a bug or not, or if it was intentional. I could see an argument either way. But I did confirm that with tokio 0.2.25 this signal handler is not installed eagerly, but with 0.3.7 it is.
I think that fine-grained enable_
I have noticed that if any crate in my dependency chain requests signal then it "taints" all the way back to my crate even though I do not have the signal feature enabled. So fine grained enable/access would probably help here anyway.
I wrote a little test program and was able to confirm that the signal handler for SIGCHLD is installed eagerly when the process and signal features are enabled, even when the program does nothing but sleep, and exit.
Ah yes you're right! I had forgotten that the process
driver will eagerly install a SIGCHLD handler which is used for all spawned processes. That's likely the culprit of the original issue. I agree that a finer grained configuration of the runtime could prevent such eager registrations
https://github.com/tokio-rs/tokio/pull/3521 <<-- my PR does this, but it's rough, likely not ready for merge at all. The names probably aren't great either. Maybe it's a start you can build from or you can advise on how to clean it up?
Thanks for posting your issue here.
Could you elaborate on why omitting the signal
feature flag doesn't work for your case? This is the intended strategy for disabling signal handling.
Because another part of my application does require signal handling.
Because another part of my application does require signal handling.
Wouldn't this part of the application turn signal handling back on in the runtime and bring you back to the same issue? I'm not super familiar with SA_ONSTACK
, but wonder if there is a way to teach the runtime how to use it and have the application opt-into such a configuration
@ipetkov It's actually worse than that. What it is is that this part of the application sets up one tokio runtime without signal handling in the pam/nsswitch modules. But because we link to another crate which also has tokio, and that does have the signal feature, it taints through and activates it in this part of the application. IE if you have A without signal, and it depends on B that does have signal, A has signal activated.
That's why I implemented https://github.com/tokio-rs/tokio/pull/3521 because this allows the runtime in A to select what it needs, even though the tokio features have been tainted through the dependency on B.
I'm not really invested enough to try to solve the SA_ONSTACK problem, but certainly it has value for tokio in being able to better cooperate with languages like go that do set SA_ONSTACK.
I'm here because I found out that tokio is installing a global SIGCHLD
handler, which in my case conflicts with the one from glib.
Now this is not a new problem, and as of relatively recently pidfd exists on Linux to fix this since you get a file descriptor that one can just add to epoll etc. Looks like there's an async-pidfd crate. Though, this is also something that the standard library could benefit from...ah and now that I look, there's this tracking issue and some recent attempts to use it.
I think discussing SA_ONSTACK
here is a bit of a red herring - the real issue is that Tokio installing a SIGCHLD
handler at all may cause whatever else in the process is doing so to lose notifications.
OK wait, no apparently I'm wrong, Go (naturally now that I think about it) doesn't use SIGCHLD
as far as I can tell, it just uses goroutines that block in waitpid()
. Each call to cmd.Wait()
ends up around here.
But that said, either way it's still process global state; the best long term solution is to get away from that. On Linux, that's pidfds as mentioned above. Unfortunately I need to care about RHEL8 for quite a while, and I'm sure I'm not the only one needing something that works on pre-pidfd Linux (or other Unix I guess).
In the case of a multithreaded Tokio runtime, we could perhaps follow Go's model and basically do a thread per child process. Maybe as some sort of opt-in tokio::runtime::Builder::threaded_waitpid()
? That'd work for everyone who owns the end binary linking in tokio alongside something else.
In the case of a multithreaded Tokio runtime, we could perhaps follow Go's model and basically do a thread per child process. Maybe as some sort of opt-in tokio::runtime::Builder::threaded_waitpid()?
In case anyone else comes here, one can also work around this by avoiding using Tokio's child process APIs and instead just use the ones from std
in a thread. See this PR:
https://github.com/containers/containers-image-proxy-rs/pull/13
which was a quite seamless change in that case.
the real issue is that Tokio installing a SIGCHLD handler at all may cause whatever else in the process is doing so to lose notifications.
I'm curious to understand what may be happening here. Tokio uses the signal-hook
crate to install a OS signal handler, and it should take care of forwarding the signal dispatch to any other previous signal handler, as expected. (At least this was the case when I looked at this last and I don't believe anything should have changed there).
Tokio uses the signal-hook crate to install a OS signal handler, and it should take care of forwarding the signal dispatch to any other previous signal handler, as expected.
Ah, I see. (reads) - that's some nice code.
In my case the GLib code happens to run after tokio, and the GLib code is making no attempt to preserve the previous handler. Hmm...that's obviously doable in the same way that signal-hook is. May look at that, but I'd need to wait a while before I could rely on it. And now that I think about it, I could likely work around this by spawning a dummy process before using tokio too.
OK so I take my earlier comments back, this issue is then somewhat specific to Go's use of SA_ONSTACK
.
But the above around using pidfd etc. on Linux I think still stands as the right fix at least on Linux.
In my case the GLib code happens to run after tokio, and the GLib code is making no attempt to preserve the previous handler.
The Unix signal APIs are far from ergonomic, though chaining signal handlers is the "expected" behavior one should follow. I find it unfortunate (and frankly surprising) that the GLib will clobber this outright...
That said, I have no problems with exploring a pidfd
based implementation for Tokio and I'd be happy to review any such efforts! The tricky bit will be implementing a form of feature detection and falling back to OS signals if pidfd
is not available (Tokio does not have a supported-kernel-version policy, and requiring that pidfd
is available could be a breaking change for some users).
Re pidfd, I think it makes sense to wait until it lands in std
; there is extensive discussion in the tracking issue https://github.com/rust-lang/rust/issues/82971 - in particular it seems to have gotten hung up on issues around going behind libc's back for clone3
, but I may try to see if we can restart things there around just using pidfd_open()
if available.
process::Child::wait
, I think it might fix this issue.We could still register a signal handler if the fallback is used. Furthermore, tokio::signal
also registers signal handlers.
You are right, this issue is more general than just tokio::process
.
Version tokio-util v0.6.3 () tokio v1.2.0 ()
Platform Linux pyrite 5.10.12-1-default #1 SMP Sat Jan 30 19:15:49 UTC 2021 (a3c8888) x86_64 x86_64 x86_64 GNU/Linux
Description
When a module built with the tokio runtime is linked or used by an application with go, the related go module crashes. In this situation, the cause is that the nss_kanidm.so module which is built with tokio 1.2.0 is loaded via nsswitch, and this interfers with the signal handling of the docker daemon causing it to crash.
The previous nss_kanidm tokio cargo.toml was:
nss_kanidm module would make a call to a localhost resolver via a unix domain socket consuming a tokio util codec type. However this required block_on and a tokio run time, so the following code was used in the nss client support module.
As tokio was built with
signal
support the call to "enable_all" would cause signal handlers to be added. These signal handlers are managed by signal-hook-registry, and it appears they are not loaded with thesigaction
flags forSA_ONSTACK
. Go loads it's signal handlers on an alternate stack, meaning any other registered handlers which do not use this flag cause the Go runtime to panic.Workarounds
Remove the "signal" flag from tokio in Cargo.toml to prevent these being loaded in the call for enable_all.
Possible Resolutions
I think there are a few ways this could be handled.
One is that signal-hook-registry has the capability to call sigaction with SA_ONSTACK in case that Rust + Go is linked in the same situation, but this may require work with the related projects.
Another possible approach is that Builder is very "coarse" with the enable flags. The options today are:
Today enable_io from docs.rs states "Doing this enables using net, process, signal, and some I/O types on the runtime." This is rather "coarse", when the Cargo.toml in tokio shows that signal is not a dependency to net/io (but is for process).
So perhaps tokio could improve the enable_ flags to have more fine grained settings such as enable_net, enable_process (implies signal). This would allow consumers to have finer control to avoid signal handler registration which can also work around the issue.
Thanks,