microsoft / node-pty

Fork pseudoterminals in Node.JS
Other
1.42k stars 233 forks source link

Proposal: Rewrite of native API using rust and N-API #507

Open daniel-brenot opened 2 years ago

daniel-brenot commented 2 years ago

Issue description

I am looking into rewriting the existing implementation of this native library in rust with N-API. I am working on a branch currently where I will publish the changes I am proposing, and I would love feedback on it once I publish the branch as WIP.

Reasons for the rewrite

There are multiple issues regarding cross platform support and Node version that would all be resolved in a rewrite in rust. It would also make solving issues such as the blocking semantics mentioned in #85 much easier.

How does the community feel about this?

If I do write this and make sure there are no errors, will this be used?

jerch commented 2 years ago

I am a fan of a full rewrite of this lib, esp. to fix those nasty low level blocking bugs, and to expose somewhat more control to the outside (termios interface). Idk if Rust is a good choice for this, is that easy enough to integrate for CI stuff and such?

On a sidenote: I already fixed #85 for POSIX systems way back (in C++), but it was not merged, because it contained too many internal chances.

daniel-brenot commented 2 years ago

Rust will definitely still integrate well with CI from my understanding. I will take a look at #85 as well to make sure that those changes would be integrated as a part of this.

From the rewrite that i've done thus far with no changes to how the API functions, the code is already quite a bit smaller and easier to understand.

I am hoping that this change will be welcomed by the community and help to advance this library moving forward.

Tyriar commented 2 years ago

Can't say for sure without seeing it, but Rust would probably be way better from my perspective and would force me to learn it more which I've wanted to do for a while anyway.

daniel-brenot commented 2 years ago

After looking at #85 my first response would be that I am going to aim to do this work with the current working version on master, and then i'll add enhancements from #85 after if it is still decided that we would like that API added in. For the moment it is just easier since there is already a decent bit of code I am rewriting.

Tyriar commented 2 years ago

@daniel-brenot :+1: the smaller the change the easier it will be to merge

jerch commented 2 years ago

@daniel-brenot I hope you can find a better way around the POLLHUP issue on linux, than my noisy poll loop. Should be possible with a dedicated polling thread injecting the data into the JS runtime. If you wonder why I re-routed the data packages via OS pipes - that was the only way I found working with theSocket interface on JS side. Maybe thats easier now with N-API primitives, idk.

daniel-brenot commented 2 years ago

@jerch I think once the initial migration is done we can definitely look at what's available and make an issue to do this.

jerch commented 2 years ago

Sure thing. Another issue I want to point out before you get your hands dirty into the wrong direction - #487 and its issues behind. It deals with a recent fork slowdown under BigSur, but got not yet merged. So be prepared for some forth and back changes on your side as well.

daniel-brenot commented 2 years ago

Created a branch with the new changes. So far it's mostly just the one file and it's still incomplete, but it is taking shape. https://github.com/daniel-brenot/node-pty/blob/rust-port/src/lib.rs

Brooooooklyn commented 2 years ago

With rust, you can implement Node.js library based on existed Rust crates, like https://crates.io/crates/portable-pty

daniel-brenot commented 2 years ago

@Brooooooklyn Great suggestion! I've looked at that library and it doesn't seem to provide enough functionality vs what we currently have to justify it in this use case.

Most of the code in the rewrite is due to the fork() function. The rest is relatively trivial.

Thank you for the suggestion though :)

daniel-brenot commented 2 years ago

@Tyriar How would you feel if the process method threw an error instead of returning undefined? It would align with the rest of the methods that throw errors when they fail. It's currently implemented as returning undefined in my version, but i'm wondering if that should be changed to allow for an error message when the process call fails.

jerch commented 2 years ago

If you mean with "process method" the pty & process creation step - thats actually another open issue (see #265). Ideally the lib exposes enough error state to the caller to know why things went wrong.

daniel-brenot commented 2 years ago

46 Can also factor into this. Precompiling for multiple platforms will be a part of this as well

corwin-of-amber commented 2 years ago

Hi 👋 I've taken the code from @daniel-brenot and got it to work using napi-rs 2.0.0.alpha4. Currently open and fork work and some of the tests even pass. There are issues with encoding and signals.

Only Unix for now... I have very little experience with Windows.

corwin-of-amber commented 2 years ago

Here are the (preliminary) results of my effort. https://github.com/corwin-of-amber/node-pty/tree/rust-port

Are ppl interested in this?

daniel-brenot commented 2 years ago

That's great! I'm happy to see that you've gotten it working on unix. I definitely still intend to work on this, but work has been demanding as of late.

corwin-of-amber commented 2 years ago

Oh good! Should I make a PR? Is this repo still active?

daniel-brenot commented 2 years ago

If it's complete, absolutely. That being said, definitely ping me once you're done, i'd love to help on the work you're doing as well moving forward where i can

corwin-of-amber commented 2 years ago

The Unix port is complete. Why don't I make a PR for your fork, then you can look at it and add Windows support when you have the time.

daniel-brenot commented 2 years ago

Sounds good to me!

Tyriar commented 2 years ago

Because of the amount of changes a PR to move to rust would include, I'm thinking we can set up a branch with CI/CD so people can start selfhosting ASAP. That will also make merging easier as the it can be stabilized on that branch.

Tyriar commented 2 years ago

FYI a big refactor happened in https://github.com/microsoft/node-pty/pull/487

Also, it might be best to maintain a fork for now and consider merging into node-pty when it's more stable (or continue on as a hard fork). I'd hate to see progress stalled because of me being the bottleneck to merge things.

daniel-brenot commented 2 years ago

Bumping this as a heads up, I am still working on this. I was on a bit of a hiatus, but i am currently working on adapting the program to use the modified build process for NAPI and doing some more touches on ensuring that everything is safe. Thanks a ton to @corwin-of-amber for his contributions to my branch, they helped a ton.

daniel-brenot commented 2 years ago

As an update, i think i'm going to update the CI for my branch to run and validate the code against the tests so i can see if my branch is passing. Additionally, i would like to look at bundling in the .node files for multiple platforms as a part of the release process. It would mean that users wouldn't need to compile for their platform if a .node file is found. It's quite simple to implement so i'm trying is out on my branch to see how well it works.

corwin-of-amber commented 2 years ago

Sounds wonderful, thanks @daniel-brenot!

daniel-brenot commented 2 years ago

Does anyone know the purpose of exposing the native module? I noticed this is done only on linux platforms and i'm curious as to why. Is this used by VSCode or other libraries? I need to know because some of the changes I made will break the exposed native module if I don't account for it, so i need to know wether I strictly need to maintain the current behaviour, or if a small change to it is acceptable.

Edit: Heres the file i'm refering to: https://github.com/microsoft/node-pty/blob/main/src/index.ts

daniel-brenot commented 2 years ago

@corwin-of-amber Should I add your name to the copyright notice as well for the files you assisted with? If so, should I add the name that is on your github profile? (Shachar Itzhaky)

corwin-of-amber commented 2 years ago

I definitely like having my name on things 😄 as long as it does not block others from using this code as free software.

daniel-brenot commented 2 years ago

Great! Also if you could have a look at the code on my branch it is fairly close to being complete, but with something preventing it from connecting to the named pipes on windows. I may take a look at it some time this week to see if i can find out why, but if you feel like having a look and seeing if you can find it then i'd appreciate annother set of eyes :)

corwin-of-amber commented 2 years ago

Ok, I'll try. IIRC the prototype I made earlier did connect to the pipes, and had a problem only with cleanup (leading to the node-not-exiting issue). I can try to step-debug the two versions.

daniel-brenot commented 2 years ago

The prototype you made connected to the existing c code in order to perform many of the tasks. I have completely rewritten the c code, so now i'm pretty sure theres a mistake in how i allocated something in rust that gets passed into the native windows FFI

corwin-of-amber commented 2 years ago

Oh right! That is actually good progress, interfacing the C++ code in order to forward the calls to WinAPI was sooooo clunky. But perhaps we can still learn something from the old behavior.

daniel-brenot commented 2 years ago

As a random aside, it seems like this project may now support thread safety. The way the pty handles are stored now works across threads using a mutex, so that might be one of the many issues this fixes.

daniel-brenot commented 2 years ago

@corwin-of-amber The issue may be due to windows antivirus on my system. I was reading the main README file for this project and realized that is one of the things this project warns against. At the very least i'll see about adding error codes to the error messages so it's easier to debug what is going wrong.

chrmarti commented 2 years ago

@daniel-brenot Is this close to be ready for testing? Anything we can help with? (Working on a CLI where a prebuilt node-pty would make installation easier. I believe we need N-API to keep the number of prebuilds low.)

daniel-brenot commented 1 year ago

@corwin-of-amber Are you willing to help out again? It compiles on windows currently and the tests all pass except 6. I've fixed a couple issues but i'm run a bit thin here. @chrmarti If you want to look as well feel free, any help is appreciated. I'm going to turn my effort towards the linux side for a bit and see how the tests are faring there.

daniel-brenot commented 1 year ago

@jerch Feel free to look at this as well.

daniel-brenot commented 1 year ago

I've looked through the list of issues and it looks like completing this would solve at least these issues:

17

46

112

405

564

558

557

551

518

438

Just for reference of why i'm very entheuseastic about completing this issue.

corwin-of-amber commented 1 year ago

@daniel-brenot thanks! I will look at it, when I have access to my Windows machine.

lydell commented 1 year ago

Do you think there’s value in doing N-API first, and then Rust?

I tried out switching to N-API. The Unix version compiles. Running bash works. Running a command with arguments crashes – I have never written a line of C++ before so I don’t know why. But it still felt promising!

https://github.com/microsoft/node-pty/compare/main...lydell:node-pty:napi

If we manage to do N-API, maybe we can then compile to WebAssembly and get a cross-platform prebuilt package! (In case winpty causes issues for wasm, I looked at removing it here: https://github.com/microsoft/node-pty/compare/main...lydell:node-pty:remove-winpty)

A Rust rewrite could come later to fix other issues.

Not sure if any of this makes sense, but just in case! I’m excited about getting rid of node-gyp in one of my projects using node-pty.

daniel-brenot commented 1 year ago

Doing them both at the same time makes sense to me. The work to convert to N-API is relatively small in the work of the rust port. It would be a large amount of effort to duplicate it in c++, only to have to do it again in rust. I think moving forward with finishing this would be a better solution.

As far as using webassembly, I don't believe that is possible, and even if it is, it wouldn't provide any benefit. The application needs to be built for each platform individually because it's specifically a bridge to native os-specific functions. A wasm build can't have the code for windows, mac and linux in it because they need to be linked on each system separately.

daniel-brenot commented 1 year ago

Linux tests seem to all be passing. I see 16 tests passing, so as long as there aren't tests that should run but don't, then linux support should be good to go. I still have to look into macos, so i'll see about setting up a mac vm for that. Mac support should be straight forward since it's mostly the same code but with a couple code paths different.

daniel-brenot commented 1 year ago

Just tested including this into eclipse theia. Looks like it works great now for being included into other projects and we just need help debugging the windows issue before this is ready to be squashed and made into a pr

corwin-of-amber commented 1 year ago

sry, it took me some time to figure out Rust on Windows and how to install llvm (using choco). Now I am getting errors about using an experimental feature (insert_entry). Are you using a nightly build? This should perhaps be encoded in the Cargo.toml somehow so that users can know which version of Rust to use.

daniel-brenot commented 1 year ago

sry, it took me some time to figure out Rust on Windows and how to install llvm (using choco). Now I am getting errors about using an experimental feature (insert_entry). Are you using a nightly build? This should perhaps be encoded in the Cargo.toml somehow so that users can know which version of Rust to use.

Yes, i do believe i use nightly. Sorry for not setting that in the cargo.toml, feel free to do that. Thanks for helping.

corwin-of-amber commented 1 year ago

@daniel-brenot I think perhaps I'm just too clueless in Windows, and/or I'm doing something terribly wrong. First, the branch (daniel-brenot/node-pty) did not even build for me and I had to change a couple things (add #![feature(entry_insert)] for that unstable feature and also make some variable mut), but then conpty does not run at all and just says Failed to connect named pipes and winpty fails with Error launching WinPTY agent. I am pretty sure I was able to do at least something previously, but now I just get this error that's not very informative.

corwin-of-amber commented 1 year ago

@daniel-brenot Am I even running the right branch? I am a bit confused because here: https://github.com/daniel-brenot/node-pty/blob/cde234931684e5e09274a7d8541655f38f9cc180/native/src/win/conpty.rs#L203

This pointer is going to be NULL so ofc InitializeProcThreadAttributeList will fail. Compare https://github.com/microsoft/node-pty/blob/a141a9dc23186a0634e8981af31af4e478322259/src/win/conpty.cc#L335

😖

daniel-brenot commented 1 year ago

You must not be on the correct branch then. https://github.com/daniel-brenot/node-pty/blob/rust-port/native/src/win/conpty.rs

I don't know why that was the branch you were looking at.

daniel-brenot commented 1 year ago

As for proc thread attribute list being null, if you check the docs that is how you get the size of the attributes list. You call the function twice. Once with null to get the size, and a second time with a list initialised with the size returned from the first call.

In the code you linked i was doing it wrong. In the correct branch I fixed that issue.