Closed bminer closed 3 years ago
I figured I would post a few clues here... the SIGTTOU signal is what is stopping the Node process in the latter example. This is occurring because the Node process is trying to configure the terminal, but the Node process is still in the background. Once the Node process is resumed, the terminal configuration call will fail with EINTR.
I don't think this is a bug.
SIGKILL is uncatchable, no cleanup can be done by the child. Prefer SIGTERM or SIGINT for this reason, node should do some TTY cleanup when it terminates because of those signals.
I agree that SIGKILL is uncatchable by the child. But why does the terminal get all goofy when a child process is killed in this way? Again, this only happens when the stdio is inherited.
core/node (cli-env-option $% u=) % cat _.js
const spawn = require('child_process').spawn;
let child = spawn('bash', [], {'stdio': 'inherit'});
setTimeout(() => child.kill('SIGKILL'), 1000);
core/node (cli-env-option $% u=) % node --version
v7.7.1
core/node (cli-env-option $% u=) % node _.js
core/node (cli-env-option $% u=) %
I use zsh inside tmux inside xfce4-terminal on ubuntu 16.10, and can't reproduce.
But I'm used to programs that interact with the terminal (less, vim and other curses programs of course, git when its opened a pager or commit editor) sometimes leaving terminals in strange states when they are SIGKILLed, but I've never done enough coding against terminal control sequences to understand exactly why, or really cared, resetting the stty works fine.
Also, note that it is bash that is being SIGKILLed, so if bash is leaving the terminal in a poor state, that sounds like a bash problem, not node.
I suppose node.js or libuv could uv_tty_reset() + uv_tty_set_mode() when a child process terminates.
Some experimentation would be needed to see if it doesn't have adverse side effects. Also, it's probably hard to write a regression test for.
@sam-github Good points, but when I run a new instance of bash
within bash
and then SIGKILL the child, there are no weird terminal side-effects. Maybe the parent bash
process fixes up the terminal when the child process dies?
Node just behaves a bit strangely here, and I cannot figure out why. If I type in the terminal while the Node parent process is running, my text is echoed back as normal (although not consumed by Node because stdin is paused by default). Once my child process is SIGKILL-ed and I type in the terminal while the Node parent is still running, the text is not echoed back any more. It's almost like the parent Node process isn't detecting that it should be in the foreground. This only happens on SIGKILL and only when the child process inherits the parent's stdio. Another clue is when the Node parent tries to read from the terminal after the child is SIGKILL-ed, it gets the SIGTTIN signal, which by default stops the process. The SIGTTIN signal should only happen when a background process reads from the terminal. But why is the parent Node process still "in the background" when its child process has died? Is it because Node is not handling the SIGKILL case properly?
Any thoughts?
node doesn't receive SIGKILL, there is nothing to handle here wrt. SIGKILL.
I think the child bash is setting the terminal to be owned by itself, and doesn't relinquish on SIGKILL, so the OS/terminal thinks that node doesn't have the right to read from it. When you spawn bash from bash, the parent bash knows it is always supposed to be a terminal owner, because it implements the POSIX job control shell features. As I said, I can't repro your problem, so some shells are capable of realizing what happened and fixing it.
If node always made itself control the terminal, would that be right? It shouldn't steal control back from its childeren and what if it spawned multiple children?
Anyhow, this is all very abstract. Do you actually have a problem? Can you provide a repro? Why are you SIGKILLing your child process instead of SIGTERMing it, when we know SIGKILLing complex terminal programs is known to leave terminals in poor states (which is why stty sane
exists)?
also, some of the man pages suggest you may be able to attach a signal handler to SIGTTIN to avoid getting the process getting stopped.
@sam-github - I think maybe I am miscommunicating. Let me start over. This bug only happens when:
Here's why I think this is a Node problem:
bash
, another node
process, whatever.For example, the following code suffers from the same problem:
const spawn = require("child_process").spawn;
let child = spawn("node", [], {"stdio": "inherit"});
setTimeout(() => child.kill("SIGKILL"), 1000);
That is Node spawning a Node child process. When any Node-spawned child process dies with SIGKILL, the terminal gets messed up. That's why I think it's an issue with the Node parent process in how it handles children killed with SIGKILL. But, I could be wrong. I know based on the strace that Node isn't modifying the terminal by default. Yet, when SIGKILL-ed, the terminal is wacked out.
Why SIGKILL? Because maybe the child process isn't responding to SIGTERM or taking too long to die.
I am using gnome-terminal-server
and bash
to launch the Node parent process. The same problem happens in the raw getty
terminal. It might be Linux-specific, but it happens everywhere for me.
Also when adding a listener to SIGTTIN, I get weird behavior. The Node process continually wants to read from stdin, but it continues to get EINTR from the OS (verified this using strace... and CPU hits 100%). See table here.
Just for fun, I tested in Node v7.8.0. Same problem. I'm not sure that this bug is 100% fixable, and it's certainly not a high priority. But... I figured I'd mention it anyway.
@sam-github @bnoordhuis Given the description in https://github.com/nodejs/node/issues/12101#issuecomment-290417253, does this seem like a bug in Node.js? Should this stay open?
I haven't been able to reproduce this myself but it's possible that https://github.com/nodejs/node/issues/12101#issuecomment-290068186 would address it. @bminer Perhaps you want to have a go at implementing that?
Also when adding a listener to SIGTTIN, I get weird behavior.
Yes, that doesn't surprise me. Weird behavior is the expected behavior if you override the default action for that signal. :-)
@bnoordhuis - I'm pretty sure that uv_tty_reset()
+ uv_tty_set_mode()
after a child process terminates would fix the problem because stty sane
fixes the terminal.
But... I can't help but wonder why this issue occurs in the first place...
How would this fix work? Put the code in the "exit" handler of the child process?
@bminer No, of the parent process. When it receives a SIGCHLD signal, it resets the tty. Libuv is probably the right place to do it.
Hi,
How can user space application avoid messing up terminal.
Any solution found to this ?
Thanks, Venkatesh.
Just looking at this (interesting) issue! Thanks for posting it, @bminer!
A couple things to understand about Unix:
All Unix processes are grouped into "process groups". A typical example of a process group are a series of programs run as a pipeline in a Unix shell, like ls -l | grep name | less
. When the shell spawns those 3 processes, it will put the 3 of them together in a new process group.
Every TTY device has a "foreground process group". As the TTY subsystem processes incoming keystrokes, when it sees combinations like CTRL+C, it converts those to signals sent to the foreground process group. Also, if a process not in the foreground process group tries reading keystrokes from a TTY device, the TTY subsystem sends SIGTTIN to it.
How is the foreground process group for a TTY set? The C-level API for this (defined by POSIX) is tcsetpgrp
. On Linux, tcsetpgrp
is a wrapper around an ioctl
on the TTY device file.
So when a shell launches a process, it uses tcsetpgrp
to put the new process in the 'foreground' (unless you put &
at the end of the command). Then, it makes a wait
system call to wait until the child process exits. Then, it calls tcsetpgrp
again to bring itself back into the foreground.
It sounds like Node is missing that last step.
Having said all that, your repro code actually works fine in my terminal and doesn't cause any adverse effects. 😆
Hmm. Just discovered something interesting. When a shell like bash
runs a command, it puts the child process into a new process group. However, when Node launches a child process using the child_process
module, it doesn't put the child into a new process group.
So there is no issue of changing the foreground process group for the terminal!
(Hope I'm not mistaken here! I am just doing some experiments using strace
.)
Can @bminer confirm if this issue is still occurring?
@alexdowad This exact issue is no longer occurring, but there is still some strange behavior.
I can replicate weird stuff on Linux Mint by doing the following (Node v14.15.3):
const spawn = require("child_process").spawn;
let child = spawn("bash", [], {"stdio": "inherit"});
setTimeout(() => child.kill("SIGKILL"), 5000);
// Be sure to copy/paste this comment, so that you ensure the 5-second timer is started
<Enter>
a few times before 5 second timer expiresThanks! I'm also on Linux Mint, running Node 16.0.0-pre (built from source). When Node is launched from zsh
, everything works well (I tried in 2 different terminals in case that might have an effect). When launched from bash
, then indeed, the shell stops echoing my keystrokes after Node (and the child shell it spawned) exit.
Hmm. This is interesting. strace
shows that a lot of... stuff is happening in the background.
First of all, although Node doesn't put child processes in a new PG, bash
puts itself in a new PG after it starts up, and it also takes control of the TTY using tcsetpgrp
.
The Node REPL is oblivious to this and keeps trying to read and write to the TTY as before, which is probably what it should do. As a result, it gets stopped by either SIGTTOU
or SIGTTIN
. Whichever one actually gets sent is basically random; it just depends on whatever Node happened to be doing when bash
takes control of the TTY.
One strange thing is that in strace
, I can see bash
trying to take control of the TTY more than once, and only the second time does Node actually get SIGTTIN
/SIGTTOU
. Before that, it continues writing to the TTY just fine. No idea why that is happening.
By the way, one weird thing about the Node REPL is that it writes to its output byte by byte. As in, one syscall for each and every byte. Seems like a funny way to do things, but maybe there is a good reason for it.
Now, both bash
and Node are doing things with the properties of the TTY. This is bash
:
[pid 287432] ioctl(0, TCGETS, {B38400 opost -isig -icanon -echo ...}) = 0
[pid 287432] ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig -icanon -echo ...}) = 0
[pid 287432] ioctl(0, TCGETS, {B38400 opost isig -icanon -echo ...}) = 0
That's bash
reading the TTY's properties, seeing that special key combos like CTRL+C are disabled ("-isig
"), and re-enabling them. It seems to disable and re-enable those on a regular basis.
And here's Node:
[pid 287406] ioctl(19, TCGETS, {B38400 opost -isig -icanon -echo ...}) = 0
[pid 287406] ioctl(19, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0
[pid 287406] ioctl(19, TCGETS, {B38400 opost isig icanon echo ...}) = 0
The difference is that the Node REPL also seems to disable and re-enable echo on a regular basis. But strangely, it doesn't seem to do anything in particular between disabling and re-enabling echo. At the same time, it also switches TTY canonical mode on and off.
So it looks like: before it got stopped by SIGTTIN
/SIGTTOU
, Node had switched echo and canonical mode off. Then, it suddenly gets frozen by one of those signals, leaving the TTY with echo and canonical mode off. The parent bash
(which launched Node) is waiting in a wait()
syscall, and when Node gets frozen, that syscall returns with a notification code telling the parent "your child is stopped". (See wait
manpage for details.) So the parent shell treats Node as a 'stopped background job' and carries on, printing a command prompt. It seems that the parent shell has no problem printing to the TTY; I don't know why. Probably it retakes control of the TTY using tcsetpgrp
again?
In the meantime, the grandchild bash
(which was launched by Node) is still carrying on. It also doesn't seem to have any problem printing its prompt to the terminal. Before long, grandchild is killed by SIGKILL
. I don't know if Node is supposed to reap dead child processes on the main thread or a different thread, but anyways, it doesn't do that, and because Node is still alive (but stopped) the grandchild isn't reparented to any other process, so grandchild stays around as a defunct (or 'zombie') process.
Meanwhile, grandparent bash
is still happily doing its thing, but it doesn't seem to notice that echo and canonical mode are off. However, if you just type fg
, then Node finally casts off its slumber, continues with what it was doing (including turning TTY echo and canon mode back on) and everything goes back to normal.
I haven't strace
d zsh
, but my guess is that zsh
probably resets all TTY properties every time it retakes control of the TTY, which would explain why it doesn't get "messed up".
OK, more information:
When Node switches echo on and off for the TTY, it's happening right here:
https://github.com/nodejs/node/blob/2c8751cb855af0b93a92286e82c0c1ca4fd3bfdc/lib/repl.js#L489-L523
Do you see those calls to _setRawMode
? Those are what are messing with the state of the TTY. And the reason is clearly explained in the comments: when you are typing at the REPL, they want the TTY to be in 'raw mode' (probably because the REPL has its own special handling for backspace characters, etc), but when evaluating code, the TTY state is switched so that special key combos like CTRL+C will raise signals. Then after finishing evaluation, they switch it back.
This is based on uv_set_tty_mode
from libuv
:
One interesting point here is that libuv
only has a couple of choices for 'TTY mode':
https://github.com/libuv/libuv/blob/4ddc292774be827a297449e2d5ef4047c85de7ca/include/uv.h#L726-L733
When Node sets the TTY mode to UV_TTY_MODE_NORMAL
or UV_TTY_MODE_RAW
via libuv
, this actually sets multiple properties on the TTY: canonical/raw mode, echo on/off, and signals on/off are all changed. The REPL is making these calls because it wants to ensure that signals are on, but at the same time, other properties are being unnecessarily switched back and forth.
Be that as it may, if this was changed, it wouldn't help to resolve @bminer's issue. The point is that the REPL wants to operate with the TTY in 'raw mode' most of the time, and if you abruptly kill (or freeze, as is happening here) a program while the TTY is in raw mode, and the shell doesn't switch it back, then... your shell will be "messed up".
I think this really comes down to weaknesses in the original concept of TTY devices and how they work. You have this 'bag of mutable state' which is the TTY properties, with different programs switching those properties around, and if any of those programs exit without cleaning up, the terminal is left in a weird state.
In this connection, it's interesting to note that when Ken and friends at Bell Labs redesigned Unix from scratch, they completely ripped out all the TTY stuff.
If bash
can be 'fixed' so that it always sets TTY properties to a known state when taking control of the TTY, that might help. Of course, the bash
guys may have good reasons for not doing that.
Or, just come over and enjoy zsh
. (It's nice and warm over here!)
I think this issue can be closed.
@alexdowad - I agree. This issue can probably be closed. This used to happen even without using the REPL, which was annoying. Now, I think Node is doing a good job leaving the TTY alone when the REPL isn't used.
This is a weird bug for sure, but I now agree with @sam-github that SIGKILL on a bash terminal is asking for trouble, as it does not give bash time to "fix" the TTY properties (i.e. set echo mode, etc). I think I first saw this bug when I tried sending SIGTERM and SIGINT to a bash child process. When bash didn't listen after 10 seconds, I would send SIGKILL, and in this case, the TTY would get messed up.
I'm going to close this, but please feel free to re-open if you enjoy working on difficult, tedious, low priority issues. ;-)
And, Happy New Year! Thanks for working to make Node better!
When a SIGKILL signal is sent to a child process that inherits all stdio, TTY clean-up stuff does not happen properly once the child exits.
Consider this code:
After running the above program, I need to
stty sane
to fix my terminal. Interestingly, this program does not exhibit the problem:The terminal is cleaned up properly after running the program, but now the Node parent process will "stop" once the child process (i.e. bash) is killed.
I should emphasize that the strange behavior is exclusive to the SIGKILL signal. SIGTERM works fine.
Could it be that clean-up code is not properly run when the child process is killed in this manner?