ibaryshnikov commented 5 years ago

Is there a way to use websockets in tide?

aturon commented 5 years ago

Not yet! This would require exposing a bit more from hyper, in particular http upgrades. Definitely worth taking on!

aturon commented 5 years ago

I'm going to mark this as a "Design" question right now, because it'd be good to talk through the API for exposing http upgrades within Tide.

secretfader commented 5 years ago

I recently found tokio-tungstenite, a project which provides tokio bindings for Tungstenite, a WebSocket library. I haven't personally reviewed the code, but it might be worth a look (at least for inspiration).

najamelan commented 5 years ago

Tungstenite works on a socket which implements io::Read + io::Write (tungstenite), or AsyncRead + AsyncWrite (tokio-tungstenite).

Here you can have a stream (like a TCP) and tungstenite will handle the websocket handshake for you.

When looking at the examples of hyper, it looks like hyper handles the handshake for you. This makes sense, as it's also listening for other http requests that might not be upgrade requests. On the other hand, it might be cleaner to leave every library their own speciality. I don't know how feasible it would be for hyper to see that it's an upgrade request, leave the data in the stream and expose that, but that's not how it works right now anyway.

In any case this is supported by tungstenite and tokio-tungstenite by the from_raw_socket method, which assumes the handshake has already taken place. It's not possible I think with these libraries to use custom subprotocols and websocket extensions I think (it doesn't look like hyper supports them either), and since these are negotiated in the handshake, a method like from_raw_socket would have to take information about subprotocols, which it doesn't.

Currently tokio-tungstenite still operates on futures 0.1 types.

So in brief, to enable (tokio-)tungstenite, it suffices to expose a socket which implements io::Read + io::Write (tungstenite), or AsyncRead + AsyncWrite (tokio-tungstenite). When using hyper under the hood, it's a matter of exposing the Upgraded object from hyper.

I have not looked into other websocket crates to see what would be needed in order to support them.

najamelan commented 5 years ago

ps: warp has all the code for exposing a Stream+Sink of tungstenite::Message on upgrade. This can inspire implementors.

najamelan commented 5 years ago

Ok, I've given the whole thing some more thought. Here is what I think:

Currently webservers like hyper handle http(s). Tungstenite also does this, as you can let it listen on a type that provides io::Read + io::Write. Unfortunately hyper only allows listening for http connections on a TCP (AFAICT). So tungstenite allows you to listen for websocket connections on any type of connection, could be UDS, named pipes, mem mapped file etc, where hyper only supports TCP.

The downside is that tungstenite also has to deal with HTTP and TLS (I suppose that when you upgrade an HTTPS connection, the websocket logic/crate no longer needs to worry about the encryption). This is redundant and creates extra work and security risks.

If hyper would be more generic, and support any AsyncRead/AsyncWrite rather than only TCP, crates like tungstenite could rely on it for the HTTP upgrade part, eliminating a bunch of code.

Both sides could focus on implementing support for subprotocols and websocket extensions to be feature complete and on converting to futures 0.3.

Now frameworks like warp and tide can expose an object that implements Sink + Stream of tungstenite::Message and holds information about subprotocols and extensions. Ideally, the type of the object returned by both warp and tide (and possibly other frameworks) would be the same and only implemented once in a separate crate. It would be a type that takes in an upgraded http connection + subprotocol and extension metatdata and that provides the Sink+Stream.

This way, there is no double work/maintenance. Every layer is clearly defined. This new crate could supersede tokio-tungstenite and provide futures 0.3 types. If hyper is to high level for dealing with websocket handshakes over things other than TCP, maybe this can also be separated out into a crate that uses http and http-parse to do so and both hyper and tungstenite can reuse this logic?

Personally I'm working on providing AsyncRead/AsyncWrite on top of such a Sink/Stream of websocket messages so that wasm code and servers can conveniently communicate rust objects on a websocket connection framed with tokio-codec/futures-codec. This logic could potentially go in the same crate as the Sink/Stream.

@seanmonstar @agalakhov What do you think of this?

agalakhov commented 5 years ago

@najamelan Such a layered system was exactly that I wanted when I coded Tungstenite. I wanted to have a single, well-tested system that can work for both async (probably Tokio-based) and purely synchronous (maybe even microcontroller-based) projects.

najamelan commented 5 years ago

So to summarize how this could be:

one handshake crate: takes a io::Read + io::Write and/or AsyncRead + AsyncWrite which are expected to receive a http upgrade request and metadata (such as which subprotocols the server accepts). It would returns a Result to a raw socket on which the handshake has been completed + metadata. This should probably also hold client side code for initiating the handshake.
one codec crate that frames a raw socket and provides a Sink + Stream of websocket Messages and a synchronous equivalent.
low level crates like hyper and (tokio-)tungstenite, as well as frameworks can both use these two crates, possibly deciding to handle Ping, Pong messages for the user. They can expose the Sink+Stream to end users.
layers on top of this like proving AsyncRead + AsyncWrite which can work regardless of which web framework the user wants to use, and which can work on connections that aren't TCP thanks to a unified low level API.

seanmonstar commented 5 years ago

Hyper's server works with any AsyncRead + AsyncWrite. It just uses TCP if you use the convenient Server::bind constructor.

The warp framework built on top of hyper uses it's built-in upgrade support, and after a WS handshake, uses tungstenite afterwards.

najamelan commented 5 years ago

@seanmonstar Oh, sorry. I had a look a the impl of Server. That solves that than.

acasajus commented 4 years ago

Can this be taken into account in the new design? There are not many web frameworks that allow http+ws in the same codebase and this would greatly help in adopting tide!

yoshuawuyts commented 4 years ago

@acasajus heh, for sure. I also would like to support #234. @zkat pointed me to the Elixir Phoenix framework, and so far it seems like one of the most solid approaches to websockets, and will probably take inspiration from that.

Additionally @ibaryshnikov has done work on implementing a standalone websocket implementation that we could probably reuse here. However before doing any of that I'd like to present the design, and gather feedback. We're not quite at that stage yet though, as the core of Tide is still under construction.

In short: this is definitely being considered, and I agree we should make this a first-class construct!

kellytk commented 4 years ago

yoshuawuyts commented 4 years ago

Came up with a rough API sketch for what a websocket API could look like. Similarly been drafting one for server sent events over in https://github.com/http-rs/tide/issues/234#issuecomment-577932544.

API sketch


use async_std::task;

#[derive(Deserialize, Serialize)]
struct Cat { name: &'static str }

let mut app = tide::new();
app.at("/wss").get(tide::ws()); // endpoint to establish a websocket connection on
app.at("/", async |req| {
    let mut socket = req.ws(); // access the connected socket

    socket.send_json(&Cat { name: "chashu" }).await?; // send some data as json
    let msg: Cat = socket.recv_json().await?; // receive some data from json

    Response::new(200.into())
});
app.listen("127.0.0.1:8080").await?;

xldenis commented 4 years ago

@yoshuawuyts it seems like with your api it wouldn't allow websockets to have their own lifecycles. Even writing something as simple as a chat server would be impossible. If it's possible to instead pass in an async closure (just like normal requests) that would be much more useful.

yoshuawuyts commented 4 years ago

@xldenis you're not wrong; related conversations about this have been happening on the SSE issue that might be worth reading up on.

https://github.com/http-rs/tide/issues/234#issuecomment-580714421

pickfire commented 4 years ago

Sorry, I may not have keep track on tide for some time. But would it be good to stick with _json for now? Or is it useful to have Into<impl Response> or something similar to not limit to just JSON? This may not be a huge usecase for websockets but seemed more useful for SSE since people can just send plain text.

Yarn commented 4 years ago

I put together a working prototype for websockets in tide based on async_tungstenite. If we can iron out how the functionality should be exposed I can make PRs for websocket support.

I think it makes sense to expose websockets in tide in essentially the same way as SSE. The harder question is the api at lower levels.

Tide

app.at("/ws").get(|mut req: Request<State>| async move {
    let mut res = Response::new(200);

    res.websocket(|mut ws| async move {
        while let Some(msg) = ws.next().await {
            let msg = msg?;
            ws.send(msg).await?;
        }
    });

    Ok(res)
});

Not sure if we should expose tungstenite types directly. Based on warp's attempt to abstract away tungstenite it looks like it's non trivial, I'm not clear on the details of why.

Maybe we could have a wrapper around tungstenite::Message that supports simple/standard use cases but still allows you to fall back to the underlying tungstenite object for interop with other libs or to access the full websocket feature set. We could put the ability to get the underlying tungstenite object behind a feature flag and not give semver guarantees for that part of the api if we want to be able to update tungstenite between major releases of tide.

http-types

I think the biggest question around this is around what api http-types should expose.

I see 2 main options here

Option 1

Expose an api to convert a connection to a websocket and have http implementations (e.g. async-h1) handle the specifics of the websocket handshake then provide a Read+Write object (or Stream/Sink over websocket messages). This api can have a failure mode for when the http implementation doesn't support websockets.

Option 2

Expose an api to get a Read + Write object for a request/response body
Expose an api for getting/setting http/2 psuedo-headers
Expose an api for checking what http version is in use.

For an http/2 implementation to not be aware of web sockets at all we would need control over http/2 settings as well, this one would be a bit weird since it's per connection, not per request.

I think option 1 is the better choice here. Option 2 would require a lot of coupling between the http implementation and other code.

As a side note I think "Expose an api to get a Read + Write object for a request/response body" should be done anyway. The reasoning is a different discussion but implementation wise it will probably essentially be done anyway to support websockets.

async-h1

Some implementation work will be needed, pretty sure nothing interesting.

Links (mostly to save myself a second finding them again) websocket over http/2 rfc http/2 rfc pr for upgrade in http-types

yoshuawuyts commented 4 years ago

@Yarn Thanks so much for making progress on this! I think indeed at the higher level trying to create parity with tide::sse would be the way to go.

Regarding the proposals

Regarding the two options you've presented: the idea so far has been to go with option 2. We haven't made any progress towards http/2 pseudo headers quite yet, but the idea was that this would be specific to HTTP/2 implementations, which would be able to derive those from the existing Request and Response objects. Additionally:

We expose an API for exposing a Read + Write object from a Sender: Response::send_upgrade.
On the client side, upgrading a connection is a matter of consuming the Body stream after the upgrade sequence has completed.
We expose both Request::version and Response::version to check which version is in use.

Further rationale for a native upgrade mechanism

The second option presents a way to not just upgrade from HTTP -> WebSocket, but to upgrade from HTTP to any other protocol. This includes non-ALPN HTTP/2 connections (as present in the HTTP/2 spec), the new version of WebRTC (upgrade HTTP to QUIC + some framing) and likely also WebUDP (upgrade HTTP to QUIC).

Current state of the implementation

The main place where we left off the implementation was to make async-h1 work with http_types::upgrade. We weren't sure if this would integrate correctly, which is why http_types::upgrade is still behind a feature flag. If that works correctly, using async-tungstenite would be a matter of performing the right handshake dance to establish the connection, and then use the tungestenite framing from there on out.

Yarn commented 4 years ago

http/2 pseudo headers ..., but the idea was that this would be specific to HTTP/2 implementations, which would be able to derive those from the existing Request and Response objects

(Noticed after I wrote the text below that I should clarify it's talking about special pseudo headers like : protocol in the websocket handshake, not the usual headers like :method and :path)

This strikes me as the worst of both worlds. It would require not only both the consumer of http-types and the http implementation to know how to handshake a specific protocol (or do anything else requiring special pseudo headers) in a given http version but also for the consumer to know what fields the http implementation adds and how it computes them.

Would an http implementation be required to implement specific functionality or would it be optional? If it's optional how would a client check if it's implemented? (not checking would likely result in non spec compliant requests if the functionality wasn't implemented)

Where would this expected behavior be documented? Would deciding on this behavior be tied to http-types development or be out of band?

It seems like this would make it extremely easy to introduce subtle http implementation specific bugs if more http implementations show up.

This would no doubt work for websockets but I feel that if we are requiring http implementations to include logic for part of the websocket handshake we should probably just have them handle the entire thing.

The second option presents a way to not just upgrade from HTTP -> WebSocket, but to upgrade from HTTP to any other protocol.

I agree and this is the main reason I think we should have an api for getting the body as a Read + Write object regardless of what code ends up being responsible for the websocket handshake.

Another use case I ran into (that was what got me looking at http-types closely to begin with) is compressed sse. The inability to flush with the Read interface makes it a lot harder to re-use existing code for compression.

I haven't actually tried the existing upgrade code(noticed it too late) but I think it should work. I do have some questions/proposed changes.

Is there a specific reason for the channel based approach that was used? A callback based api seems easier to use/implement and just simpler in general. It also avoids the relatively easy mistake of awaiting the channel without creating a new task (which would presumably just hang without any error indicating why). You can always send the io object through a channel in the callback if needed.
I think it might be a good idea to use some name besides "upgrade" for the api to avoid the impression that it's somehow coupled to the http/1.1 upgrade header.

Yarn commented 4 years ago

I've been thinking about it a bit more and I'm leaning toward option 2 now. I think the most awkward bit with this approach is how SETTINGS_ENABLE_CONNECT_PROTOCOL gets set with http/2 but this can happen outside the http-types api though.

I think this could also contribute to a situation where it's tricky to figure out what combination of crate versions actually work together if the websocket handshake is split out into a different crate. Given this will typically be wrapped by a framework like Tide it probably would be a non issue normally.

The discussion about how to handle psuedo headers in http/2 isn't a blocker for most of the work here so I'm going to start moving forward more on implementation.

najamelan commented 4 years ago

Just thought I'd chip in quickly. You can find an overview of problems that I found with the warp implementation in this issue: seanmonstar/warp#257. All in all, it comes down to a mediocre layer on top of tungstenite to avoid exposing the dependency. I think it's a poor choice, because all it tries to prevent is a breaking change if the backend were to change, however fixing the issues is a breaking change as well, so it doesn't even solve that, ending up being a pure loss and leaving it in an unusable state for now.

Some of the big issues with warp where not allowing extraction of binary data without copying (that's fixed now) and the impossibility to extract any information when an error happens (isn't fixed).

In my opinion, there is 2 options:

Build a fully featured, well designed and well tested websocket interface that won't require breaking changes to fix and hide the dependency. Should be properly used, tested and vetted by people actually using websockets.
Simply expose tungstenite types for now and call it a day. The tungstenite API is rather decent, which is part of why warp ended up with something worse than what they were trying to hide I suppose.

Yarn commented 4 years ago

Above I proposed

we could have a wrapper around tungstenite::Message that supports simple/standard use cases but still allows you to fall back to the underlying tungstenite object for interop with other libs or to access the full websocket feature set. We could put the ability to get the underlying tungstenite object behind a feature flag and not give semver guarantees for that part of the api if we want to be able to update tungstenite between major releases of tide.

I'm not exactly sure what the api is going to look like yet but I'm currently leaning towards having a way to fall back to a tungstenite::Websocket but not allow falling back for individual messages.

Yarn commented 4 years ago

I finished initially adding websocket support.

Main remaining tasks I see

Testing/Benchmarking, especially around the async-h1 changes (I've run cargo test but that's pretty much it so far)
Writing new tests/docs/examples for the websockets functionality
Merging the latest tide changes
Improve support for headers in websocket handshake

I haven't implemented a new api for websockets, it just allows direct access to the tungstenite object. I think making a new api can (and probably should) wait. I've wrapped the tungstenite object in a wrapper so a new api can be added without breaking changes.

It contains pretty substantial changes to async-h1 which I didn't notice would be needed initially. The core issue was that async-h1 creates a BufReader that ends up owned by the Request, this means that there is some arbitrary amount of data that gets put in a buffer you have no way of getting back (this is also an issue for implementing keep alive since you have no way of finding the end of the current request or getting the start of the next request if it's in the buffer). There is also an assumption that a response body is always chunk encoded or has a content length.

I made a new crate containing websocket handshake logic. It doesn't currently fully support parsing the websocket related headers (the spec requires rejecting invalid headers).

A working test is at https://github.com/Yarn/tide_ws_test

Patched libraries: tide (diff) (PR) async-h1 (diff) (PR) http-types (diff) (PR)

New library: websocket_handshake

pickfire commented 4 years ago

Do we really need let mut ws = handle.into_inner();? Since tide focuses on being ergonomic I am not sure if that's good.

Yarn commented 4 years ago

The idea is that the handle will provide a basic api that covers some set of basic use cases. Probably hiding non text messages and providing automatic serialization/deserialization or something along those lines. Adding a wrapper type now means we should be able to add that later without breaking existing code using tungstenite directly.

Yarn commented 4 years ago

I was asked to write up some info about the PR so I'm going to try to dump some info about it.

server/channel_bufread.rs This is essentially BufReader from async_std with a channel instead of Read and implementing Write that passes calls through to the contained IO.

server/decode.rs The decode_with_bufread function takes a BufRead instead of a Read and returns info about the type of body in the request. In the current design this isn't actually needed and we could just use decode (at the cost of an extra layer of buffering).

server/mod.rs The channel BufReader is used instead of a clone of the io. A task is spawned which drives the underlying IO object. It performs the following steps.

Read the headers, split the buffer and send the start containing the full headers to the reader owned by decode
If no body has been received try to read the start of it
Try to send the start of the body to either the reader owned by endpoint or the http implementation
Continue sending data to the reader that received the first part of the body until the reader is exhausted

In order to avoid allocating new buffers a pair of Arc<Box<[u8]>> is used. Before a new buffer is accepted from the channel BufReader drops the current one it holds. The io task can then get a mutable reference to the buffer to read in more data while BufReader holds the other.

There is still a lot of room for this implementation to be cleaner but I want review/feedback of the overall concept before putting a bunch more time into it. It's very much possible I'm missing a better solution. There are also some things that still need to change, either to facilitate keep alive or to iron out the exact semantics of Body::io.

To answer a common question, why not have a clonable BufRead? BufRead returns a &[u8], this requires the reference to outlive the function. If you wrap the buffer in a Mutex or similar so it can be cloned the reference to it can't outlive the function.

awestlake87 commented 3 years ago

Hey, I haven't been involved in the design of this at all, so I don't know all the details and motivations, but websocket support is a feature that I definitely want in the near future. I was just wondering if you guys had considered extending the URL router to handle different protocols generically on top of a TcpStream.

I know in async-tungstenite you just create a standard TCP listener that acts as a stream of TcpStream for every connection to the server, then the WebSocket protocol handshake happens on top of that by passing the TcpStream to async_tungstenite::accept_async. This makes me think that all we really need is to make the router forward TcpStream to some "ws" and "wss" protocol handlers and a 3rd party library like async_tungstenite could handle the rest.

It seems like some of the focus on this design is encapsulating websockets so you don't have to worry about it changing in the future, but as a user, I think it'd be easier for me to use whichever library is best suited to my application + existing library dependencies. Plus it'd be less to develop and maintain and could potentially extend to other protocols as well.

Again though, I don't know if I have the full picture. Is there any reason why this isn't possible / is a bad idea?

Edit: Also, I was just using TcpStream as an example. I believe async_tungstenite::accept_async is more generic than that. Looks like it accepts anything that's AsyncRead + AsyncWrite + Unpin.

Yarn commented 3 years ago

Part of these changes is doing essentially what you describe (Body can be a callback that gets an AsyncRead+AsyncWrite object). Almost all of the work is supporting that in async-h1. The implementation uses async_tungstenite, gating this behind a default feature in case people want to only have a different version of async_tungstenite in their dependencies wouldn't be hard to do.

awestlake87 commented 3 years ago

Yeah, I was looking further into it, and I think what I know which part of the picture I was missing. When a TCP connection is made, you don't know anything about the URL the client was requesting, just the IP address and port. So you can't perform route matching until the connection is already working with the application layer messages. Because of that, I don't think the default async_tungstenite::accept_async behavior would work like it usually does.

Sounds like you guys have this figured out though. I'm looking forward to this feature, so thanks in advance for your hard work!

jbr commented 3 years ago

Tentatively closing this as complete with a first release of tide-websockets — eventually this will likely be more closely integrated into tide, but for now it's one extra crate to pull in, and there's room for the api to mature before integration

http-rs / tide

WebSockets support #67

API sketch

Tide

http-types

Option 1

Option 2

async-h1

Regarding the proposals

Further rationale for a native upgrade mechanism

Current state of the implementation