refraction-networking / water

WebAssembly Transport Executables Runtime
Apache License 2.0
18 stars 1 forks source link

Dialer does not support TCP semantics #37

Open fortuna opened 5 months ago

fortuna commented 5 months ago

Hi there, congratulations on getting this project to this state, it looks quite promising!

As part of the development of Outline and Outline SDK, I played with interfaces a lot. One thing that became clear to me is that the Go dial API isn't that great.

Most critically, Dial returns a net.Conn, which doesn't support TCP semantics. To support TCP, you need to be able to close the write end (which sends a FIN). While TCPConn has CloseRead() and CloseWrite(), net.Conn does not. That's a big omission, and the reason why we use StreamConn instead in the Outline SDK. You must be able to indicate you are done sending without tearing down the entire connection.

Other less important issues, perhaps personal preference:

gaukas commented 5 months ago

Thanks for reaching out, @fortuna.

Thank you for you and your team's contribution to the circumvention community with the Outline project and more. To address your questions/concerns:

Most critically, Dial returns a net.Conn, which doesn't support TCP semantics. To support TCP, you need to be able to close the write end (which sends a FIN). While TCPConn has CloseRead() and CloseWrite(), net.Conn does not.

Yes, you are absolutely correct and I have noticed this issue as well. However we are yet to figure out a easy way to do it: unlike other tools which has a by-all-means flexible way to block on I/O operations until the underlying operation on the fd is done, by the current VM-like structure, WATER lacks the ability to detect if the I/O operation, especially the fd_write, has been done on a network socket. In which case it is not necessarily correct for a client to invoke CloseWrite() on the network socket once it is only done writing to the internalFd. Alternatively we might be able to close internalFd to notify the WebAssembly to close the networkFd -- which would require some moderate amount of work in terms of importing syscalls from the host into WebAssembly guests.

DialContext is unnecessarily verbose, and having two Dials is redundant and confusing. Just make Dial take a context.Context.

While this might be a good advice in terms of disambiguate, it violates the consistency of function signatures comparing to net. I know you might have some different opinions on net, but undoubtedly it is still the choice of the overwhelming majority of the community and we are trying to provide something net-like with additional functions/flavors.

The network parameter in the Dial is unnecessary...

For the same reason of simply respecting the signature and support the existing projects/codebases...

Lastly, I understand your concerns and I do think they are valuable and informative. Again, WATER is still in the shape of a research project. While we hope to bring it into the community to be used in the real-world as soon as possible, there are plenty of rooms for improvement and we are trying our best to work on it. We are always open to more advice, critiques and suggestions from the community, and contributions are always much appreciated!

gaukas commented 5 months ago

Most critically, Dial returns a net.Conn

Same reason why we choose to use Dial. But would you perhaps suggest that we should do some type assertion checking and return a custom interface that embeds net.Conn instead, for the sake of compatibility.

fortuna commented 5 months ago

However we are yet to figure out a easy way to do it

CloseRead and CloseWrite would use sock_shutdown. As long as you respect the sequence of the calls to the fd, it should be safe.

My other thought is that perhaps WATER should provide higher level primitives. In particular, expose default Stream and Packet Dialers that the WASM code can use, rather than have WASM use tcp/udp.

That way the implemented transport can actually be composed, like we do in the Outline SDK.

The composition will make your transports more reusable and powerful.

For example, I should be able to do multiple hops by combining different transports. Or I can inject a base dialer that tunnels traffic to a remote server to test a transport from that server.

fortuna commented 5 months ago

While this might be a good advice in terms of disambiguate, it violates the consistency of function signatures comparing to net ... it is still the choice of the overwhelming majority of the community

net.Dialer is a struct, not an interface. So you can't use water.Dialer where net.Dialer is needed. Code like the net.Resolver, which allows injection of a dialer, actually takes a function, rather than take a net.Dialer.

You can use a net.Dialer where water.Dialer is needed, but so is the case if you just have DialContext. And, even if you provide a different signature, it's trivial to provide an adaptor (example).

Furthermore, you now force me to implement two methods instead of one in order to provide a water.Dialer, making your API harder to use. It also confuses me, since I don't know which method will be called without looking at the caller code.

It's much nicer if I just need to provide one method. Even better if you provide a constructor to turn a function into a dialer so I don't need to create types to create custom Dialers (example).

I'll note that it's also hard to implement your water.Listener interface. Why does it need to implement net.Listener? And what's the Addr meaning if you are listening on a remote server?

I recommend sticking to a single method per interface when possible. As Rob Pike says in his Go Proverbs: "The bigger the interface, the weaker the abstraction"

But would you perhaps suggest that we should do some type assertion checking and return a custom interface that embeds net.Conn instead, for the sake of compatibility.

Not a type assertion, but your water.Conn could extend the net.Conn , like we do with our transport.StreamConn. Your Dial can return a water.Conn, which can be plugged wherever a net.Conn is needed.

fortuna commented 5 months ago

My other thought is that perhaps WATER should provide higher level primitives. In particular, expose default Stream and Packet Dialers that the WASM code can use, rather than have WASM use tcp/udp.

If you provide Dialer primitives to the WASM code, then you can probably block all system calls, addressing the security concern you brought up in your paper.

gaukas commented 5 months ago

CloseRead and CloseWrite would use sock_shutdown. As long as you respect the sequence of the calls to the fd, it should be safe.

Understood. But who should be invoking the syscall becomes the new issue. The host (WATER Runtime API) or the guest (Transport Module)? The former one is agnostic to the actual progress of a previous writing started by water.Conn.Write, while the latter does not really have full access to syscall (WASI Preview 1 has extremely limited syscall support).

It is actually possible (and almost always the case given WebAssembly runs much slower than native) that when the water.Conn.Write returns, no fd_write has been made yet.

...expose default Stream and Packet Dialers that the WASM code can use, rather than have WASM use tcp/udp.

I should be able to do multiple hops by combining different transports.

I totally agree with you on this and it will be the ultimate goal for a production-ready WATER. Currently, a major blocker for this is the WebAssembly runtime and WASI standards (we have good reason to not want to build our own from scratch but using the existing ones). For example in wazero, the support on fd_write/fd_read on file descriptors are implemented with syscall and rawFd. In which case supporting inserting custom net.Conn into WebAssembly will require major rework on the socket support, which drifts WATER Specs away from standard WebAssembly/WASI runtime and make it infeasible to implement WATER runtime in any platform that we did not officially support (let's just assume if anyone would want to do that first). Also it is unlikely for us to be able to poll a random high-level application.

One of our promise, or let's say compatibility guarantee (hopefully) is that we never wanted to conflict with the current mainline standard (i.e., wasip1 as of 01/23/2024) and it should be viable to build WATER using ANY WebAssembly runtime w/ complete WASI support. So far our patches made to wazero are all about dirty-fixing functions that wazero skipped on their end. In other words, what WATER wants should be completely doable through WASI-based function import/export. And I have been even exploring the viability of wrapping UnixConn (with network being unix vs. unixgram/unixpacket) as a generic forwarder for abstracted net.Conn but made little progress so far.

I will definitely keep exploring possible ways to achieve this, please feel welcome to make suggestions on more specific design/implementation details.

So you can't use water.Dialer where net.Dialer is needed.

Of course I can't use it. What I was saying is regarding the consistency of the function signature, not the interface interchangeability.

you now force me to implement two methods instead of one in order to provide a water.Dialer

Could you please provide a use case where you need to PROVIDE a water.Dialer? It is usually the case that you USE a water.Dialer unless you were talking about implementing a transport/vN (that's usually my job). I assume there might be some miscommunication. But the only Dialer WATER is expecting from a caller is a dialerFunc func(network, address string) (net.Conn, error).

hard to implement your water.Listener interface

Same reason, it is not meant to be "implemented" but used.

Your Dial can return a water.Conn, which can be plugged wherever a net.Conn is needed.

Yes. But for the dialerFunc WATER is expecting some net.Conn to be returned. I was discussing how this net.Conn should be used inside WATER.

If you provide Dialer primitives to the WASM code, then you can probably block all system calls

Again, so far we can't really do it given the lack of ability to transmit types other than int32 between host and guest. We wanted to avoid writing to the memory and returning raw ptr with len at all cost to avoid dirty operations but seemingly it will eventually be inevitable 😆.

gaukas commented 5 months ago

(wow that's a super long response)

gaukas commented 5 months ago

addressing the security concern you brought up in your paper.

I am assuming you were talking about

However, even with this isolation, malicious WATM binaries could still make arbitrary connections and potentially leak sensitive data from a circumvention tool that uses an arbitrary WATM binary.

... this part? I would be curious how provide Dialer primitives to the WASM code would mitigate this risk.

gaukas commented 5 months ago

It is usually the case that you USE a water.Dialer unless you were talking about implementing a transport/vN (that's usually my job).

Same reason, it is not meant to be "implemented" but used.

For a developer wants to import WATER, examples actually includes everything they need.

Don't get confused by RegisterDialer, RegisterListener and RegisterRelay, these are called when you import a certain version of transport (e.g., transport/v0), so that a caller may choose which version they want to support by simply importing the corresponding package:

import (
    _ "github.com/gaukas/water/transport/v0"
    _ "github.com/gaukas/water/transport/v1" // future
    _ "github.com/gaukas/water/transport/v2" // future
    // _ "github.com/gaukas/water/transport/v3" // not imported, support will not be compiled into the binary
    _ "github.com/gaukas/water/transport/v4" // future
)

But I guess this part is confusing. I should add better documentation on these 3 functions as well as the "importing transport/vN" part.

Edit: now seriously considering moving these 3 functions into an internal package. And more seriously questioning my decision on why not doing so when I created them.

Edit 2: I remembered. It is left public to allow any third party transport version support using WATER's infrastructure.