Closed mitnk closed 3 years ago
You can't just install a SIGCHLD
handler and allow it to execute at arbitrary points in time. The Rust runtime is not async-signal-safe. Imagine SIGCHLD
arrives in the middle of an allocation or other operation. This means your signal handler cannot safely execute any Rust code.
Instead, you need to think about when it is safe to run the SIGCHLD
handler. Generally, you need to ensure that no Rust code is in progress when SIGCHLD
fires and the handler executes. One strategy to do this is to block SIGCHLD
(via sigprocmask
) during all sections where you run Rust code.
Specifically, you could unblock it only when you call libc::read
to read input from the terminal. This will handle the case where children aren't being reaped while the shell waits for input on the prompt. To handle the first case (children exiting while you're waiting for a fg job) I'd recommend to call waitpid(-1, ...)
Thanks. I'll try find places need to block the signal.
To handle the first case (children exiting while you're waiting for a fg job) ... to call `waitpid(-1, ...
In what real cases, is waitpid(-1)
needed? I don't see the necessary of using it at least for normal cases like following (the sig handler will reap the bg job):
$ sleep 5 &
$ sleep 100
Thanks. I'll try find places need to block the signal.
To handle the first case (children exiting while you're waiting for a fg job) ... to call `waitpid(-1, ...
In what real cases, is
waitpid(-1)
needed? I don't see the necessary of using it at least for normal cases like following (the sig handler will reap the bg job):$ sleep 5 & $ sleep 100
The problem with letting the SIGCHLD
handler reap this job is that you would need to unblock SIGCHLD
while you're calling waitpid()
. You could probably do that:
sig_unblock(SIGCHLD)
waitpid()
sig_block(SIGCHLD)
if you instead calling waitpid(-1, ...)
you would save the unblock/block.
It's also more efficient to wait for all children than to poll for each bg job in a loop the way your old code did, btw.
The problem with letting the SIGCHLD handler reap this job is that you would need to unblock SIGCHLD while you're calling waitpid()
1) For waiting the fg job (logic of current main program code), I do not want to use wait -1
, since that would change the way to organize the results of a pipeline foo | bar | baz
, if bar or baz exit before foo.
2) why I need to unblock SIGCHLD before waitpid()? I'm only expecting block and then timely unblock SIGCHLD occasionally, like:
sig_block(SIGCHLD)
sth = read(100)
sig_unblock(SIGCHLD)
i.e. main program is not blocking SIGCHLD in most time.
Just checked some docs on Interruption of signal handlers (please search to locate section titled "Interruption of system calls and library functions by signal handlers"): There are some system calls (read()
is included) that can automatically restart when interrupted by single handler (use flag SA_RESTART
when installing single handler).
So I guess there may be rare places need to block/unblock SIGCHLD?
I doesn't make sense to block around read, you need to unblock around the read.
You can't take a signal most of the time because it could arrive, for instance, when your Rust code does a Box::new()
or similar and executes inside the allocator.
You can't take a signal most of the time ...
Ah, I seem to follow you now: you mean we need to block SIGCHLD at the most time, to make Rust safe.
See https://github.com/vorner/signal-hook/issues/109 for more discussion.
The issue of restarting is orthogonal. If you unblock during the read() call and the signal handler executes, the read() will be restarted.
@godmar Had a glance on the links you provided. Seems it's way complex-er than my expectation :( I may leave this issue/PR open before Rust address it.
Personally, I can live with both options below: a) Keep cicada shell not having a SIGCHLD handler. b) Assuming possibility of the signal handler breaks Rust is very very low (near not happening in real use, at least in cicada project), we can install the SIGCHLD handler without doing extra block/unblock logics.
Handling block/unblock in every corner seems over complex to me. I'll spend more time to see anything proper we have.
Handling block/unblock in every corner seems over complex to me.
FWIW, I implemented a shell in Rust as well.
In my opinion, I think it's completely ok to leave SIGCHLD blocked all the time, except when reading input. So you don't block/unblock in every corner. You block, and you have a single site where you unblock, read, and block. (I've done this by patching rustyline
).
SIGCHLD
is not latency critical, and your shell will spend most of its time either waiting for a fgjob or waiting for input, so there's not much to be gained from frequent blocking/unblocking.
ps: I also don't unblock while waiting for a fgjob because I use waitpid(-1, ...)
I implemented a shell in Rust as well
Yeah, I read that, that's nice! Any chance to make it public when you polished it maybe? :)
it's completely ok to leave SIGCHLD blocked all the time, except when reading input.
Good to know. I'll try it. Thanks
I implemented a shell in Rust as well
Yeah, I read that, that's nice! Any chance to make it public when you polished it maybe? :)
No. I'm contemplating it for an assignment in a class I'm teaching.
If you code:
signals::unblock_signals();
match rl.read_line() {
...
then a signal (SIGCHLD
) could arrive right in the middle of rl.read_line()
, which is not safe.
FWIW, I'm using rustyline
not linefeed
. The place where I am unblocking SIGCHLD
is around this function.
a signal (SIGCHLD) could arrive right in the middle of rl.read_line(), which is not safe
@godmar correct sir. It's a tradeoff of not forking and patching lib linefeed
. It's also not safe to unblock before waitpid for fg pipelines too.
Just for curious, do you ever have a Rust demo code that can produce/simulate an asycn-signal-safe issue?
a signal (SIGCHLD) could arrive right in the middle of rl.read_line(), which is not safe
@godmar correct sir. It's a tradeoff of not forking and patching lib
linefeed
. It's also not safe to unblock before waitpid for fg pipelines too.
unblocking before libc::waitpid
is fine because waitpid is async-signal-safe. What's not async-signal-safe is running Rust code that potentially allocates or drop.
Just for curious, do you ever have a Rust demo code that can produce/simulate an asycn-signal-safe issue?
Here's one in C: https://www.cs.cmu.edu/afs/cs/academic/class/15213-f16/www/code/15-ecf-signals/signaldeadlock.c It'll work similarly in Rust.
Merging PR now :)
linefeed::read_line()
is a concern with signal handlers.BTW, when using waitpid(-1)
for fg jobs (see last commit), the main thread has to do some sleep between waitpid()
s. Not sure if this would affect commands like time <any-commands>
etc. Any ideas here?
Ohh, with sleep(1ms)
with waitpid(-1)
, now it's consuming too much CPU (~2%) with long-run commands (e.g. sleep 9999
). Changing to 5ms take ~0.8% CPU and 10ms to ~0.4%. Also causing "Idle Wake Ups" (seen in Mac's Activity Monitor). (the sleep won't affect result of time foobar
tho).
Seems I need to change back to sync waitpid(pid)
for fg jobs.
Oh man, I should not use WNOHANG
here.. problem solved.
try fix #33