open-cli-tools / concurrently

Run commands concurrently. Like `npm run watch-js & npm run watch-less` but better.
https://www.npmjs.com/package/concurrently
MIT License
6.98k stars 227 forks source link

concurrently exitcode=0 when it receives SIGINT itself (e.g. on ^C) #470

Open dimikot opened 4 months ago

dimikot commented 4 months ago

MacOS and Linux at least.

Sounds like concurrently tool reaction on receiving SIGINT is exiting with code 0 (instead of exiting with e.g. code 130 as the best-practice reaction on Unix processes). This breaks scripts like:

#!/bin/bash
set -e # stop on failed commands
concurrently "one" "two"
echo "still here"

When ^C is pressed in the above script, it still prints "still there", although it should not.

The reason why proper SIGINT handling is important is also described here: https://mywiki.wooledge.org/SignalTrap#When_is_the_signal_handled.3F, section "Special Note On SIGINT and SIGQUIT".

Repro

Console 1:

$ node_modules/.bin/concurrently --version
8.2.2
$ bash -c "node_modules/.bin/concurrently 'exec sleep 1000' 'exec sleep 2000' && echo Exited with exitcode=0"

Console 2:

$ watch -n0.5 'pstree | egrep "sleep" | egrep -v egrep'

CleanShot 2024-04-10 at 17 54 06@2x

Then ^C in console 3:

CleanShot 2024-04-10 at 17 54 33@2x

When ^C is pressed, SIGINT is sent to the entire process group (which is -78069 in this example). I.e. SIGINT is sent to all 4 processes on the screenshot. The bug is that, when concurrently itself receives that SIGINT, it exits with exitcode=0, i.e. it tells the caller that it terminated successfully, although it's not true. According to default unix practices, the process killed by SIGINT should exit with a nonzero exit code (ideally with code=130 which is 128+SIGINT).

We can reproduce the same behavior by not pressing ^C, but by:

  1. Sending SIGINT to concurrently pid itself.
  2. OR - by sending SIGINT to the whole process group, like kill -SIGINT -78069 in the above example.

Interestingly enough, this happens only when receiving SIGINT. On e.g. SIGTERM or SIGHUP it behaves properly.

P.S. There is an ugly work-around for this:

bash -c 'node_modules/.bin/concurrently "exec sleep 1000" "exec sleep 2000" & wait $! && echo Exited with exitcode=0'

Since the shell itself also receives that ^C SIGINT (it's a member of the process group), it fails in wait call, so the message is not printed. But again, this is not a good practice (it is based on a side effect, e.g. it still doesn't help when there is no ^C involved, and only concurrently tool is sent with a SIGINT).

gustavohenke commented 4 months ago

Thanks for the comprehensive bug report! We've had this behaviour for a long while (#150 + #164), which was probably pragmatic back then.

I think it's straightforward to fix this for unix-based systems; for Windows, it's more complicated due to it always using cmd.exe even on WSL.

dimikot commented 4 months ago

Here is a longread about best practices in shell-like programs (and programs launching other programs) on how to handle SIGINT: https://www.cons.org/cracauer/sigint.html

So if I understood you correctly, exiting cleanly with code 0 was intentionally implemented in https://github.com/open-cli-tools/concurrently/pull/164, as a work-around, to not let npm print a nasty error message when ^C is pressed. If so, then I suspect that the implementation should've been different: instead of exiting with zero exitcode when a child process received a SIGINT (and thus making npm happy), concurrently should've terminated itself with SIGINT in the end (instead of exiting with any exit code, 0 or not 0). In this case, I believe npm would behave nicely. This is all explained in details in the above article.

In short, there are 3 ways to terminate a process:

  1. Just finish running the code (in C, it's like exiting from main() function). Exit code will be 0 in this case (or N if returned N from main()). In Node, exit code is 0 when the script finishes normally.
  2. Calling process.exit(N). Exit code will be N. (Actually, (1) does exactly this internally.)
  3. Running process.removeAllListeners("SIGINT"); process.kill(process.pid, "SIGINT") (or any other signal, but SIGINT is the most interesting here). This is what a shell-like program (like concurrently) should do when it detects that one of its children finished with SIGINT. Notice that this method has nothing to do with exit codes (although technically, the exit code is returned as 130, but the caller like npm or bash uses some special API to detect whether a child exited because of a signal or not).