openssh-rust / openssh

Scriptable SSH through OpenSSH in Rust
Apache License 2.0
232 stars 36 forks source link

Deviations from tokio #79

Closed mrzv closed 2 years ago

mrzv commented 2 years ago

Prior to version 0.9.0, openssh had a very nice feature that it matched very closely tokio::process::{Command, Child, ChildStdin, ChildStdout}. So it was very easy to work with both local and remote commands by spawning them in a function and returning the respective stdin/stdout for the code to interact with.

Since 0.9.0, this no longer "just works". openssh::Command::spawn is now async; tokio::process::Command::spawn isn't. ChildStdin/ChildStdout are now opaque types, so they cannot be returned in the same places as their equivalents in tokio::process.

What are the reasons for these changes? And are there any straightforward workarounds? (I'm a beginner in Rust, so I may be missing something very basic.)

NobodyXu commented 2 years ago

Hi mrzv!

I am the one who proposed and made these changes to openssh.

First, a few background information to help you understand what this crate does.

openssh connects to the remote via ssh multiplexing. It first spawns a master process managing the connection, then it communicates with the master process to launch process/open port at the remote.

Before 0.9.0, openssh spawns a new ssh process for every process to be launched on the remote (process-mux).

After 0.9.0, openssh can directly connect to the ssh multiplex master process to launch process/open port on the remote if native-mux feature is enabled and you create the Session using SessionBuilder::connect_mux or Session::connect_mux.

For the implementation details of native-mux, checkout openssh-mux-client and also checkout connection modes for details on process-mux vs native-mux. TL;DR, under native-mux mode, Session would connect to the unix socket created by the master process to communicate with it.

As such openssh::Command::spawn has to be async since spawning involves communication with the master process.

ChildStdin, ChildStdout and ChildStderr are now opaque types so that users cannot assume details of these. They actually are just newtypes of tokio_pipe::{PipeWrite, PipeRead} if you check the source code, as we convert tokio::process::Child* to tokio_pipe::{PipeWrite, PipeRead}.

We keep them opaque so that a major release on tokio_pipe would not force also to also create a new major release.

Another difference in 0.9.0 is that RemoteChild::wait is now also async and we now have our own Stdio type.

NobodyXu commented 2 years ago

@mrzv You might be interested in using the openssh::process module, which contains all types necessary launching a new process on the remote.

You can replace tokio::process imports with openssh::process to make it easier to use openssh.

mrzv commented 2 years ago

Hi @NobodyXu.

Thanks for the explanation. This mismatch with tokio is quite unfortunate for my use case, but I understand where you are coming from. I'll stick with openssh 0.8.1 for now, but I'll have to figure out if it's possible for me to upgrade; I'm guessing I don't want to fall too far behind.

jonhoo commented 2 years ago

To add to what @NobodyXu said here, this is unfortunately sort of fundamental. spawn has to be async, and arguably should be in tokio too, but I'm guessing that tokio does a tokio::spawn internally or something so that it doesn't have to be. That feels misleading though since spawn is, inherently, an asynchronous operation. As for the Child* types, they were only equivalent to the tokio::process types sort of by accident. That wasn't a sustainable long-term solution. Arguably I should have made them opaque from the beginning to avoid this particular assumption from being made :sweat_smile:

mrzv commented 2 years ago

From my point of view, the old design had a tremendous value. My use case is that I have a client-server setup, which communicate using RPC over stdio. With the old setup, I could have a local server or a remote server with essentially the same codebase (tokio::process vs openssh). Now I need to do a lot more work to support both.

mrzv commented 2 years ago

The async spawn is a minor issue; I can wrap it in a tokio::spawn. But the Child* types diverging from tokio is the real issue.

jonhoo commented 2 years ago

Yeah, I absolutely see how it had value, it just wasn't sustainable unfortunately, since it required that openssh essentially was implemented internally only using tokio::process.

NobodyXu commented 2 years ago

That feels misleading though since spawn is, inherently, an asynchronous operation.

tokio::process::Command::spawn internally calls std::process::Command::spawn, letting it does the hardwork and then tokio would retrieve its pid and stdin/stdout/stderr, setting them to be non-blocking if present, then wait on the pid.

Not sure how tokio waits on the pid, but I imagine they either using the internal signal and trigger all pending processes to call wait() since signal can be coalesced or use the new pidfd to track each process efficiently (more efficient than using signal, since polling on pidfd is precise and cannot be coalecsed).

NobodyXu commented 2 years ago

But the Child* types diverging from tokio is the real issue.

There're a few way to migrate this issue:

Finally, if this issue is really widespread, maybe I can expose a function to convert tokio::process::Std* to corresponding openssh::Std*, but this feels like a hack and I don't think this is a good idea.

mrzv commented 2 years ago

Thanks for the suggestions; I didn't know about futures_enum. I'll investigate what works. I use essrpc, so it's the question of what it will accept, but I think it just wants something that's both AsyncRead and AsyncWrite.

NobodyXu commented 2 years ago

@mrzv Are you using essrpc::transports::BincodeAsyncClientTransport?

It only needs AsyncRead + AsyncWrite + Send, which is implemented by both tokio::process::Std* and openssh::Std*.

One way of fixing this is to convert both tokio::process::Std* and openssh::Std* to tokio_pipe's corresponding PipeRead and PipeWrite on Unix.

Then you can create a newtype contains both PipeRead and PipeWrite then implements both AsyncWrite and AsyncRead for it.

Since PipeRead and PipeWrite both implement Unpin, so implementing AsyncWrite and AsyncRead for it is pretty simple, just forwarding the method to the underlying types.

mrzv commented 2 years ago

@NobodyXu Yes, that's exactly what I'm using.

I think you are underestimating just how much of a newbie I am in Rust. (And how much I appreciate simplicity of a solution.)

So far, of what you suggested, the futures_enum route sounds the most appealing. If nothing else, it avoids having to make assumptions about whether I can convert the given types to PipeRead/PipeWrite. It only assumes something about implementing a couple of traits I almost certainly need to assume the processes provide.

NobodyXu commented 2 years ago

So far, of what you suggested, the futures_enum route sounds the most appealing. If nothing else, it avoids having to make assumptions about whether I can convert the given types to PipeRead/PipeWrite. It only assumes something about implementing a couple of traits I almost certainly need to assume the processes provide.

Yeah, futures_enum is the most simple yet efficient implementation. It can be implemented with a simple derive and does not involves any boxing.