mitnk / cicada

An old-school bash-like Unix shell written in Rust
https://hugo.wang/cicada/
MIT License
981 stars 50 forks source link

adding SIGCHLD signal handler #38

Closed mitnk closed 3 years ago

mitnk commented 3 years ago

try fix #33

godmar commented 3 years ago

You can't just install a SIGCHLD handler and allow it to execute at arbitrary points in time. The Rust runtime is not async-signal-safe. Imagine SIGCHLD arrives in the middle of an allocation or other operation. This means your signal handler cannot safely execute any Rust code.

Instead, you need to think about when it is safe to run the SIGCHLD handler. Generally, you need to ensure that no Rust code is in progress when SIGCHLD fires and the handler executes. One strategy to do this is to block SIGCHLD (via sigprocmask) during all sections where you run Rust code.

Specifically, you could unblock it only when you call libc::read to read input from the terminal. This will handle the case where children aren't being reaped while the shell waits for input on the prompt. To handle the first case (children exiting while you're waiting for a fg job) I'd recommend to call waitpid(-1, ...)

mitnk commented 3 years ago

Thanks. I'll try find places need to block the signal.

To handle the first case (children exiting while you're waiting for a fg job) ... to call `waitpid(-1, ...

In what real cases, is waitpid(-1) needed? I don't see the necessary of using it at least for normal cases like following (the sig handler will reap the bg job):

$ sleep 5 &
$ sleep 100
godmar commented 3 years ago

Thanks. I'll try find places need to block the signal.

To handle the first case (children exiting while you're waiting for a fg job) ... to call `waitpid(-1, ...

In what real cases, is waitpid(-1) needed? I don't see the necessary of using it at least for normal cases like following (the sig handler will reap the bg job):

$ sleep 5 &
$ sleep 100

The problem with letting the SIGCHLD handler reap this job is that you would need to unblock SIGCHLD while you're calling waitpid(). You could probably do that:

sig_unblock(SIGCHLD)
waitpid()
sig_block(SIGCHLD)

if you instead calling waitpid(-1, ...) you would save the unblock/block. It's also more efficient to wait for all children than to poll for each bg job in a loop the way your old code did, btw.

mitnk commented 3 years ago

The problem with letting the SIGCHLD handler reap this job is that you would need to unblock SIGCHLD while you're calling waitpid()

1) For waiting the fg job (logic of current main program code), I do not want to use wait -1, since that would change the way to organize the results of a pipeline foo | bar | baz, if bar or baz exit before foo.

2) why I need to unblock SIGCHLD before waitpid()? I'm only expecting block and then timely unblock SIGCHLD occasionally, like:

sig_block(SIGCHLD)
sth = read(100)
sig_unblock(SIGCHLD)

i.e. main program is not blocking SIGCHLD in most time.

mitnk commented 3 years ago

Just checked some docs on Interruption of signal handlers (please search to locate section titled "Interruption of system calls and library functions by signal handlers"): There are some system calls (read() is included) that can automatically restart when interrupted by single handler (use flag SA_RESTART when installing single handler).

So I guess there may be rare places need to block/unblock SIGCHLD?

godmar commented 3 years ago

I doesn't make sense to block around read, you need to unblock around the read. You can't take a signal most of the time because it could arrive, for instance, when your Rust code does a Box::new() or similar and executes inside the allocator.

mitnk commented 3 years ago

You can't take a signal most of the time ...

Ah, I seem to follow you now: you mean we need to block SIGCHLD at the most time, to make Rust safe.

godmar commented 3 years ago

See https://github.com/vorner/signal-hook/issues/109 for more discussion.

The issue of restarting is orthogonal. If you unblock during the read() call and the signal handler executes, the read() will be restarted.

mitnk commented 3 years ago

@godmar Had a glance on the links you provided. Seems it's way complex-er than my expectation :( I may leave this issue/PR open before Rust address it.

Personally, I can live with both options below: a) Keep cicada shell not having a SIGCHLD handler. b) Assuming possibility of the signal handler breaks Rust is very very low (near not happening in real use, at least in cicada project), we can install the SIGCHLD handler without doing extra block/unblock logics.

Handling block/unblock in every corner seems over complex to me. I'll spend more time to see anything proper we have.

godmar commented 3 years ago

Handling block/unblock in every corner seems over complex to me.

FWIW, I implemented a shell in Rust as well. In my opinion, I think it's completely ok to leave SIGCHLD blocked all the time, except when reading input. So you don't block/unblock in every corner. You block, and you have a single site where you unblock, read, and block. (I've done this by patching rustyline).

SIGCHLD is not latency critical, and your shell will spend most of its time either waiting for a fgjob or waiting for input, so there's not much to be gained from frequent blocking/unblocking.

ps: I also don't unblock while waiting for a fgjob because I use waitpid(-1, ...)

mitnk commented 3 years ago

I implemented a shell in Rust as well

Yeah, I read that, that's nice! Any chance to make it public when you polished it maybe? :)

it's completely ok to leave SIGCHLD blocked all the time, except when reading input.

Good to know. I'll try it. Thanks

godmar commented 3 years ago

I implemented a shell in Rust as well

Yeah, I read that, that's nice! Any chance to make it public when you polished it maybe? :)

No. I'm contemplating it for an assignment in a class I'm teaching.

godmar commented 3 years ago

If you code:

signals::unblock_signals();
match rl.read_line() {
 ...

then a signal (SIGCHLD) could arrive right in the middle of rl.read_line(), which is not safe.

godmar commented 3 years ago

FWIW, I'm using rustyline not linefeed. The place where I am unblocking SIGCHLD is around this function.

mitnk commented 3 years ago

a signal (SIGCHLD) could arrive right in the middle of rl.read_line(), which is not safe

@godmar correct sir. It's a tradeoff of not forking and patching lib linefeed. It's also not safe to unblock before waitpid for fg pipelines too.

Just for curious, do you ever have a Rust demo code that can produce/simulate an asycn-signal-safe issue?

godmar commented 3 years ago

a signal (SIGCHLD) could arrive right in the middle of rl.read_line(), which is not safe

@godmar correct sir. It's a tradeoff of not forking and patching lib linefeed. It's also not safe to unblock before waitpid for fg pipelines too.

unblocking before libc::waitpid is fine because waitpid is async-signal-safe. What's not async-signal-safe is running Rust code that potentially allocates or drop.

Just for curious, do you ever have a Rust demo code that can produce/simulate an asycn-signal-safe issue?

Here's one in C: https://www.cs.cmu.edu/afs/cs/academic/class/15213-f16/www/code/15-ecf-signals/signaldeadlock.c It'll work similarly in Rust.

mitnk commented 3 years ago

Merging PR now :)

  1. Let's track in another issue if linefeed::read_line() is a concern with signal handlers.
  2. Will handle issue of STOP/CONT status of job control in another PR.

BTW, when using waitpid(-1) for fg jobs (see last commit), the main thread has to do some sleep between waitpid()s. Not sure if this would affect commands like time <any-commands> etc. Any ideas here?

mitnk commented 3 years ago

Ohh, with sleep(1ms) with waitpid(-1), now it's consuming too much CPU (~2%) with long-run commands (e.g. sleep 9999). Changing to 5ms take ~0.8% CPU and 10ms to ~0.4%. Also causing "Idle Wake Ups" (seen in Mac's Activity Monitor). (the sleep won't affect result of time foobar tho).

Seems I need to change back to sync waitpid(pid) for fg jobs.

mitnk commented 3 years ago

Oh man, I should not use WNOHANG here.. problem solved.