microsoft / node-pty

Fork pseudoterminals in Node.JS
Other
1.47k stars 241 forks source link

proposal: gain granularity under UNIX by refactoring of the module #73

Open jerch opened 7 years ago

jerch commented 7 years ago

@Tyriar Not sure how to name it - well I think at least the unix part of this module is kinda broken by design. I've done some own investigations by trying out a much simpler pty implementation under linux and came to the following conclusion:

The C part of the module does to much at once, esp. the Ptyfork is heavy weighted - it tries to deal with forkpty, termios settings, the exec* call and registering of the waitpid callback at once - my prosopal is to separate those things into single calls exported to JS and gain much more interactivity in JS land. Also it would make pty.open useful again. Atm pty.open is a dead end since the needed symbols from C are missing.

Changes of my proposal:

Note that those changes would degrade the module to basic C semantics. Therefore it would need to set up higher level stuff in UnixTerminal.

Tyriar commented 7 years ago

unix/pty.cc is the file I've touched the least since forking. So basically you want to expose more stuff in the native API and do more of the handling in JS?

jerch commented 7 years ago

Yep. I stumbled over the somewhat monolithic c part and had debugging problems in #72 due to it. Separating the basic functionality aspects and exporting them individually to JS would make hunting down those bugs easier.

jerch commented 7 years ago

@Tyriar Want to bring back this proposal since the current implementation seems to cause several issues (see #134 #85 and #86). For #86 I had a closer look at libuv's uv_spawn implementation which is used by node's child_process module. Imho the most elegant way would be an abstraction of the pty functionality in libuv itself, but this would break with the "work on all plattforms easily" philosophy, esp. windows with the quite different internal semantics and the dependency of winpty will break this. A similiar proposal got rejected serveral years ago (https://github.com/joyent/libuv/issues/573). Also libuv's process handling makes several assumptions about the standard IO channels that cant be applied to a process behind it's own pty easily, which means a big effort to integrate it nicely into libuv. Guess we are stuck to the 3rd party module thing.

To get those isssues above solved a bigger rewrite of the module is needed imho. I hesitate to do such a big rewrite, therefore I want to know want you think about it first.

Tyriar commented 7 years ago

@jerch I think this is a good direction to go, just to clarify, here are the goals:

If this is accurate I propose we tackle this in chunks:

  1. Come up with and agree on the exact TS/C++ APIs in this issue
  2. Implement the new methods in TS/C++ with tests (this could be done one-by-one if we wanted?)
  3. Move PtyFork's logic to TS
jerch commented 7 years ago

Agreed. I gonna try to build a first C++ API idea to address the cumbersome C functions like fork, exec* and waitpid. With those tamed we should be able to give people access to the spawn functionality in pure JS (first point on your goals list).

jerch commented 7 years ago

@Tyriar Made some progress with an early C++/JS API - see https://github.com/jerch/node-newpty

It is a very early version with hardcoded exec* and only tested under linux (Ubuntu 14 LTS). openpty and forkpty is written in JS from C primitives exported by the module. From here on I have to test the compatibility with node's stream objects settings first before I can go on completing the transition. Also I want to keep the high level module API untouched if possible. Please let me know what you think about this first approach.

jerch commented 7 years ago

@Tyriar Regarding issue #85 I end up at the same problem as before - if the pty device is non blocking it dies as soon as the child process exited discarding leftover data in the pty pipe. Starting it blocking solves this but reintroduces the problem described here https://github.com/chjj/pty.js/issues/103#issuecomment-77676106

No clue yet how to deal with it in a convenient way other than here #86

Tyriar commented 7 years ago

@jerch I think we should lay out what we would expose in terms of a TS method signatures to discuss before jumping into impl, eg.

function forkpty(options: { blah })
function ptsname(masterFd: number)
// etc.
jerch commented 7 years ago

@Tyriar Im far from API sanitizing, still trying to comprehend the conceptual DOs and DONTs with libuv streams and the pty file descriptors. Atm the blocking vs non blocking drives me nuts, non blocking seems to be unreliable with node builtins and Im trying to work around this.

Edit: Found a non blocking solution with a polling thread for each open pty master_fd. It basically keeps ignoring POLLHUP until no further POLLIN occurs and hangs up afterwards. Downside is the usage of 2 more pipe()-pairs to redirect data to JS, gonna try to simplify and cleanup this approach.

Edit2: Pushed a cleaned version with the poller thread (reading only so far). Fixes #85.

jerch commented 7 years ago

@Tyriar Pushed a cleanup version. You can test it with test.js. stdin and stdout are working as expected now. Its a pity that the solution has to reassemble basic libuv features like the polling just to skip the POLLHUP while POLLIN is still available, well so what. Can try to establish a solution for stderr next, this is much easier now with the new granularity. The drawbacks noted in #71 are still valid and might break OS compatibility - how does winpty handle this?

Some calls are still missing (esp. the exec* family) but the tough stuff is IMHO done with the IO channels working. From here on I can start layouting the low level API with JS/TS. This would be the starting point for the higher level API onwards.

Tyriar commented 7 years ago

If stderr is trivial with these changes I guess we should go for it and block it off on Windows with an exception or something.

jerch commented 7 years ago

@Tyriar Hmm as I thought - the stderr thing is completely unreliable and highly depends on the slave process itself - bash for example drops back to buffered pipe mode if stderr is a pipe (stdout without any escape sequences). If stderr is another pty the OS cant even launch its own process group (only tested on linux, guess the OS tries to attach a second controlling terminal, which must fail for sure) and bash drops back to simple pipe mode where every IO happens over stderr. Very weird - gonna have a closer look at the llvm code to understand the limitations.

Tyriar commented 7 years ago

Sounds like it's very much non-trivial and we're asking for a lot of complexity by pursuing this 😛

jerch commented 7 years ago

Yep, the whole pty subsystem is not intended to have a separate stderr channel, gonna stop hacking against the OS again ;) - So no extra stderr channel for now...

jerch commented 7 years ago

Done with the basic C++ transition. Features of the new native lib:

Feel free to play with it (see text.js for an example). The fork & exec* within JS needs extra testing under OSX since forked processes are very limited there for security reasons (only tested under linux so far). Gonna do OSX, BSD* and Solaris tests next to ensure the C++ part is supported across the most common POSIX systems. Also I would limit the tests to node versions >= 4 (Older versions are likely not to work due a much older libuv and nan module API).

jerch commented 7 years ago

OMG here comes the low level fun - OSX does not support pty file descriptors with poll. Not sure about kqueue support there, have to test it. Seems the POLLHUP for slave hangups is linux specific. Might need platform dependant poller implementations. Bummer.

jerch commented 7 years ago

Update: The poller works now with OSX (tested only on OSX 10.10 though).

Edit: Done with the basic OS compat tests. Only minor adjustments were needed, the C++ part runs with nodejs >= 6 on:

jerch commented 7 years ago

@Tyriar Started to move the module to TS and added some first basic tests. Please have a look at it, esp. with TS I need your help (not even sure, if it makes sense what I did at the low level). Imho we can start with the API design.

Tyriar commented 7 years ago

Had a look, it's really quite a radical change (literally everything). I'm actually surprised just exposing fork() like that to JS works as well, seems like a risky thing to do as it's not typically done in a native node module?

How would you go about launching a new shell with environment/args with this new API?

jerch commented 7 years ago

fork: The internal states of v8 should be save to be forked (JS code itself is synchronous), only problem I see here is node's event loop - libuv's event loop was not fork safe until recently and any async stuff in the child is likely to fail. But thats not a problem if the child only exec*s. Any synchronous stuff should still work in the child with the exception of loading new native modules under MacOS (its forbidden there to load new libraries into forked processes before exec).

new_shell: This is what test.js does atm as a high level test case. With execve('/bin/bash', ['/bin/bash', '-l'], process.env); it launches a login shell with custom environment. A spawn implementation would be build with something like that code.

Tyriar commented 7 years ago

The main issue is I would have no idea what was going on with fork if I didn't learn C in university, I wouldn't expect many JS devs to know what is going on. Perhaps it could be done using 2 callbacks instead?

jerch commented 7 years ago

I think it is possible to drop the fork & exec stuff and use childprocess.spawn instead (as proposed in #134). Edit: See test4.js and spawn2 for a child_process based version.

Tyriar commented 7 years ago

Would something like this work?

forkpty(
  opts?: I.OpenPtyOptions,
  masterCallback: (pid: number, fd: number) => void,
  slaveCallback: (pid: number, fd: number) => void
)
jerch commented 7 years ago

Yes that should work. The master (parent) callback is not needed though. I suggest to "enclose" the child completely in the callback and anything after forkpty would be parent context anyways, like this:

forkpty(
  opts?: I.OpenPtyOptions,
  slaveCallback: () => string
)
{
    let nativePty: I.NativePty = openpty(opts);
    let pid: number = native.fork();
    if (pid < 0) {
        fs.closeSync(nativePty.master);
        fs.closeSync(nativePty.slave);
        throw new Error('error running forkpty');
    } else if (pid === 0) {
        // child
        fs.closeSync(nativePty.master);
        native.login_tty(nativePty.slave);
        let error: string = slaveCallback();
        process.stderr.write(error);
        process.exit(-1);
    }
    // parent
    fs.closeSync(nativePty.slave);
    return {pid: pid, fd: nativePty.master, slavepath: nativePty.slavepath};
}

It still exposes the limitations of the child in the callback though, the callback code must be synchron and exec early. A call would look like this:

let forked = forkpty(options, () => {return native.execvp('bash', ['bash', '-l']);});
// do something with the symbols in `forked`

If you still feel uneasy about this we could even hard code the exec stuff and just let people use that higher API, where they can only set parameters to the exec calls. Thats what spawn does atm (plus registering an exit callback).

jerch commented 7 years ago

About child_process - it is currently not possible to use uv_spawn for that, it basically lacks the TIOCSCTTY ioctl, but it could be done with a helper binary (as child_pty does). It would give us the ChildProcess-API for free and make those own fork&exec stuff pointless. What you think?

jerch commented 7 years ago

Yes with the helper binary it works under all POSIX systems I can test here (see https://github.com/jerch/node-newpty/tree/child_process). Maybe we should build the unix stuff based on that idea? How hard would it be to move the windows stuff to the ChildProcess-API as well?

Edit: Cleaned up that branch - much shorter now, most of the code is obsolete with child_process.

Tyriar commented 7 years ago

Windows is mostly a black box for me beyond the winpty public API atm https://github.com/rprichard/winpty/blob/master/src/include/winpty.h

jerch commented 7 years ago

I have implemented a separate STDERR pipe option. Tested with a small helper binary as direct child, grandchild (behind bash -l) and great-grandchild (bash -l && bash -c HELPER). The trick to get this working was not to apply the full login_tty semantics but only a part of it. Works with all systems here. :smile: It still might confuse child programs, therefore it is not enabled by default.

jerch commented 7 years ago

Set up a RawPty class to host the lowlevel pty stuff like size and termios handling. Works on all platforms except Solaris, seems I am replaying old issues: https://github.com/chjj/pty.js/issues/14 :wink:

jerch commented 7 years ago

Done with the basic implementation. Features now:

Gonna try to plant it under the current API for compatibility. Also Solaris keeps bugging me, need to find a fix for the awkward pty behavior there. Under NetBSD libuv tends to crash, no clue yet why. Works under Linux, OSX, FreeBSD and OpenBSD and passes all tests.

jerch commented 7 years ago

@Tyriar The basic stuff is done, got a working UnixTerminal class that is almost compatible with the interface from node-pty (copied the tests). Added some advanced usage features to the classes RawPty and Pty beside the normal "spawn a shell at pty" use case. Not sure if it will stay since it triggers libuv errors on NetBSD occasionally.

The IO poller is on par to slightly faster compared to node-pty for linux, but seems to have worse performance on OSX and NetBSD. It does non blocking read/write cycles and drops back to poll once all channels would block. It is the second incarnation, the first one only relied on poll without the nonblocking cycles and was only half as fast as node-pty. Gonna try a third version with uv_poll directly.

jerch commented 7 years ago

Nope does not work with lib_uv, neither with uv_poll_t nor uv_stream_t.

jerch commented 7 years ago

@Tyriar Had some ehem fun with node's event loop, tried to get rid of the polling thread with those additional pipes - well, without success. The libuv primitives for IO polling are not capable to handle the special POLLIN & POLLHUP condition that happens on the pty master file descriptor. Also falling back to libuv loop primitives like uv_prepare_t or uv_check_t and doing the poll&read in the callbacks does not help, the loop won't run unless some other event keeps the loop running.

Nonetheless the new implementation even with poll thread and additional pipes runs 2 times faster than the old node-pty implementation and even faster with more concurrent ptys open. Gonna cleanup stuff, then it is ready for a review.

BTW found several issues in the current node-pty for unix while testing around.

jerch commented 7 years ago

@Tyriar Did some cleanup, feel free to play around with the new unix implementation. Maybe you can also test it on a newer OSX version (can only test 10.10 here).

A few notes:

Tyriar commented 7 years ago

@jerch probably won't have time to look at this for some time unfortunately, I've been out attending events and next month will be very focused on improving Windows support on the terminal.

jerch commented 7 years ago

No problem, the source will not run away ;)

Tyriar commented 6 years ago

@jerch are there any major take aways you got from your refactor?

It would be cool to integrate some of it into the repo, it's just I don't think it would be wise to do a huge PR which changes everything. Instead safer incremental changes to make sure we don't break too much as I like to always ship the latest node-pty in VS Code Insiders, radical changes scare me 😛

jerch commented 6 years ago

@Tyriar Hmm well, imho the most annoying bug in the current node-pty impl is the truncated output if you run a process that exits right after. For typical shell usage (as with xterm.js) this is no biggy, the bug hardly will be triggered since the shell just sits there and waits for input giving the underlying pty <--> libuv loop time to process all data. For any non shell based pty usage this is a showstopper. The refactor deals with that problem by the custom poll thread with additional os pipes. Sadly this cant be cherrypicked easily as it changes the internal API to separated STDIN and STOUT.

Next big change - child_process for fork/exec - can't be "soft migrated" either, it changes the whole underlying interface (the Terminal class is obsolete).

For unix users the ability to alter the termios settings from JS might be of interest. This could be applied with small changes to node-pty (see https://github.com/jerch/node-newpty/blob/master/src/pty.ts#L380, which is almost a copy of the current C code in node-pty).

Tyriar commented 6 years ago

@jerch so i think we do see the issue out dropped output in vscode as we now use the terminal for short-lived tasks that run one command and then terminate, Right now I workaround this with the following code:

https://github.com/Microsoft/vscode/blob/f4aa1247306e63c0c10ef791cf8c903be4d1595a/src/vs/workbench/parts/terminal/node/terminalProcess.ts#L51-L61

Changing the Terminal class is kind of just changing the entire library and might be too radical to pull into this (as no one would bother upgrading).

I'm also curious what you thought of https://github.com/Tyriar/node-pty/issues/151

jerch commented 6 years ago

@Tyriar The right place to fix this bug once and for all would be the libuv poller, it just cant handle the special condition that happens on the master fd (POLLHUP with pending data to read), it closes on POLLHUP.

About the Terminal class - yepp, I agree. Thats why I implemented the UnixTerminal on top of new child_process based implementation. I still think that being compatible to the more common child_process API is a plus.

Tyriar commented 2 years ago

I'm moving all issues related to open under this one as right now it's not in the typings and the state of it is a little unknown as I don't use it and we don't have tests for it.