data events can be fired after exit is fired

Tyriar commented 7 years ago

Downstream issue/repro: https://github.com/Microsoft/vscode/issues/23051 Temporary fix: https://github.com/Microsoft/vscode/commit/9464b54f39d8db943ddd4c134d9bef835b7bd506

Relevant commit that tried to fix the problem: https://github.com/chjj/pty.js/commit/4af628538ff5f970c2c84da31ba1e20db62696c0

jerch commented 7 years ago

@Tyriar Any chance to reproduce it with a simpler case? Imho it is related to the async data cycling (with a possible buffer delay), while the exit gets through earlier. Might be related to the whole socket thingy and open pandora's box ;)

Tyriar commented 7 years ago

I expect the repro would work when plugged into examples/fork/index, by setting shell to "cmd.exe" and args to ["-c", "ls -lR"] then killing the process right when exit is fired.

jerch commented 7 years ago

@Tyriar Not sure yet if it is related, but if I run the following snippet under linux, the output will stop at different positions in data and kinda never finishes output:

var os = require('os');
var pty = require('./lib/index');

var ptyProcess = pty.spawn("ls", ["-lR", "/usr/lib"], {
  name: 'xterm-color',
  cols: 80,
  rows: 30,
  cwd: process.env.HOME,
  env: process.env
});

ptyProcess.on('data', function(data) {
  console.log(data);
});

setTimeout(function(){}, 5000);

jerch commented 7 years ago

I drilled down the last problem to node's stream handling - neither net.Socket nor tty.ReadStream will reliably output the full data. The exit of the slave program kinda always happens prior all data got handled. It seems to be a race condition between node's non blocking stream handling and the closing of the pty pipe by the exit of the slave program. Gonna try to work around this by keeping the slave fd open until no further data can be read. Not sure yet if this will help to solve the original problem.

jerch commented 7 years ago

@Tyriar What is the idea behind the 'exit' event of pty? Shall it mark the real process end (as it does atm more or less) or shall it be the last event of any possible interaction with pty (i.e. all pending data read)?

What happens under linux atm is somewhat unreliable due to the buffers and the async callbacks involved:

slave program runs and generates output --> output gets catched by net.Socket, it buffers the data and generates appropriate 'data' events
program gets killed, the module does this atm by sending SIGHUP to the child via process.kill(this.pid, 'SIGHUP') --> gets invoked immediately by libuv
program stops --> slave end of the pty pipe is closed, the async waitpid function set up by PtyFork gets invoked and cleans up the process data, sets the return status code AND installs the 'onexit' handler to the libuv loop to be invoked somewhere in the near future
next step: RACE CONDITION
- condition 1: the 'onexit' handler runs (does whatever was registered there) prior the next net.Socket data delivery invocation
  - subcondition: net.Socket internal buffers are empty, nomore data can be delivered since the pty device is closed - imho a wanted case
  - subcondition: net.Socket internal buffers are not empty and leftover data gets delivered - seems to be the issue case
- condition 2: net.Socket data delivery has no pending data and stumbles over the already closed pty pipe - the socket will trigger an 'error' event with 'EIO', the 'onexit' handler gets invoked afterwards - seems to me to be a wanted case

Back to the question above - if the 'exit' event shall mark the stop of the slave program the current behavior is correct, the program exit can happen at any time while there still might be unhandled data. If the 'exit' shall be the final event it might be better to attach it to 'end' of the pty ReadStream.

To be able to get all output data from slave until the exit happened, some more work is needed. This is due to the fact that any buffered data in pty pipe itself (not yet in net.Socket) will be lost as soon as any last endpoint gets closed (thats the issue I observed in https://github.com/Tyriar/node-pty/issues/72#issuecomment-291813706).

TL;DR: To fix it all I'd define pty's 'exit' as the last event at all and deliver ALL pending data beforehand.

Tyriar commented 7 years ago

Yes I think the pty's exit event should be triggered after the process is exited and all data events have been fired.

alexxed commented 5 years ago

2 years after, this is still an issue on Linux, and the one fix that worked for me is https://github.com/chjj/pty.js/pull/110#issuecomment-406470140

microsoft / node-pty

data events can be fired after exit is fired #72