Use Case Definitions - Githubissues

benjchristensen commented 9 years ago

As per initial agreement in https://github.com/reactive-streams/reactive-streams-io/issues/1 to pursue further, this issue is intended to start collecting and defining the use cases.

benjchristensen commented 9 years ago

I created a placeholder document where detailed use cases can be added, discussed and formalized via pull requests: https://github.com/reactive-streams/reactive-streams-io/blob/master/USE_CASES.md

danarmak commented 9 years ago

I think the biggest question raised in the original use cases in #1 is what underlying transports to support. A simple protocol would support transports TCP, websockets, HTTP/2 custom frames, and anything else that looks like a bidirectional fully-duplex byte stream.

HTTP/1.1 is a harder problem. There are ways to force it to be similar to a generic TCP stream, like server-sent events, but doing this requires custom client and server HTTP-handling code, and I think some HTTP proxies might take offense. An HTTP/1.1 based protocol which can be implemented in Javascript inside a browser, and on top of most if not all server HTTP frameworks, would I think have to be limited to the simple case of a single reactive stream per HTTP connection, where the client requests data and the server replies with the whole stream's contents. Then we'd have to add support for resuming after a connection breaks. If this is a real use case, the best protocol for this might be very different from the general TCP-based one, and might benefit from a separate specification.

You also mentioned UDP. Presumably it would need to handle message loss and ordering and splitting large messages across packets. But maybe the regular TCP-based protocol could be layered on top of that with minimal modifications.

benjchristensen commented 9 years ago

I am okay restricting this effort to TCP, WebSockets (and HTTP/2 if possible) and once that succeeds looking at whether there is a subset that would work over HTTP/1.1. UDP is not something I use so I can't comment much on it.

I agree with your perspective though.

I think this is the key part you say that we could scope ourselves to:

anything else that looks like a bidirectional fully-duplex byte stream

Regarding HTTP/2. I have not yet attempted it, but it seems that with a custom client we could leverage it's framing and push promises to achieve this, but I may be mistaken. Your answers suggests greater insight into this so I'd appreciate clarification on whether HTTP/2 can be used here.

tmontgomery commented 9 years ago

It would be a bit leaky to allow reliable delivery semantics to be a concern in the protocol. So, UDP is probably not a great alternative as a base layer. Simple reliable delivery as a base should be enough. Something like Aeron is workable.

I think bi-directional full-duplex is fine as a base. TCP, WebSocket, HTTP/2 are all quite possible. "Byte-stream" is probably unnecessarily strong as message boundaries must also be preserved and framing must be assumed to be practical. i.e. no infinite message length, etc.

danarmak commented 9 years ago

@benjchristensen functionally, HTTP/2 would work. The likely problem is performance.

A naive implementation could send each stream element as an HTTP message. But if the elements are small (e.g. ints) the HTTP framing would be much larger than the elements. Especially with server push, since it includes both a request and a response. So we'd have to do custom framing inside the HTTP entity stream in each direction, just like over TCP.

I don't know enough about writing in-browser Javascript to say if an efficient implementation would be possible there. E.g., can a JS/browser client stream request entities?

@tmontgomery I'm not sure I understand your point about framing and message boundaries. Surely they can be handled at the application (RS) level regardless of transport? We could use something like protobuf for framing.

NiteshKant commented 9 years ago

I think there is a disconnect (atleast to me) w.r.t the requirements for the transport.

I see two train of thoughts:

How to define reactive-streams contract over multiple transport protocols.
How to define a common transport protocol for all client/servers and layer reactive stream contract on it.

Me personally was thinking about this more as 1 above than 2.

For simplicity, If I consider a client that taps to a firehose of events, on connect. The API for two models would look like:

Model 1

           ReactiveStreams.newTcpClient("127.0.0.1", 8099).subscribe(/*my subscriber*/);
           ReactiveStreams.newWsClient("127.0.0.1", 8099).subscribe(/*my subscriber*/);

Model 2

           ReactiveStreams.newClient("127.0.0.1", 8099).subscribe(/*my subscriber*/);

(Notice that it does not specify the transport in model 2)

It will be beneficial (again atleast for me) to clarify which model we are discussing w.r.t transport.

My $0.02 is that at this layer, we should be more geared towards 1 (no opinions whatsoever) so that people can layer opinions on top of it around which protocol they use.

danarmak commented 9 years ago

@NiteshKant I think we are discussing a protocol which can be run over any transport that satisfies certain requirements ("bidirectional fully duplex byte stream"). I think this is what you mean by model 1, although we're describing a protocol, not an API.

That said, it will likely be necessary to specify how it should be implemented on top of particular different transports, like HTTP/2. So it won't be completely transport agnostic.

NiteshKant commented 9 years ago

@danarmak Thanks for the clarification!

I think we are discussing a protocol which can be run over any transport that satisfies certain requirements ("bidirectional fully duplex byte stream")

Great! Makes sense.

I think this is what you mean by model 1, although we're describing a protocol, not an API.

Indeed, I had provided the API just as an attempt to indicate intent not as an indication of the end goal of protocol discussion.

benjchristensen commented 9 years ago

functionally, HTTP/2 would work. The likely problem is performance.

@danarmak thanks for the insight on that. This will be something I need to understand better over time as I am exploring either WebSockets or HTTP/2 for broader public usage (device to servers). Internally for service architectures I can use whatever makes sense such as RS.io over TCP.

rstoyanchev commented 9 years ago

I think we are discussing a protocol which can be run over any transport that satisfies certain requirements ("bidirectional fully duplex byte stream"). I think this is what you mean by model 1

I think SockJS is a good example even if not perfect in every respect (as an example that is). It provides WebSocket emulation over WebSocket and HTTP with very thin framing. It's very close to TCP + a codec but the primary focus of SockJS is on browsers (and it's also text based). Whatever the outcome of this, I'd hope it can be used on top of SockJS.

Regarding reliable delivery and the splitting of messages, I also think it's better left as a concern for protocols on top. The WebSocket protocol can split messages but in practice servers have very concrete message size buffers that clients and servers using higher-level "sub-protocols" (like STOMP) tend to work around this by providing their own splitting + aggregation in order to provide a predictable message size.

maniksurtani commented 9 years ago

+1 to just using bidi, fully duplex streams with reliable delivery as a requirement for the base protocol. Start small. Let's visit HTTP/1, UDP if we find that there desperately is a need. Also, we should consider what the base protocol could give us for free. A lot of what we want to design here (semantically, at least) already exists in HTTP/2, and the RS.io protocol should make it logical and easy to map concepts across where possible/available.

mattibal commented 9 years ago

Regarding the underlying transport protocols to use, I would suggest to have a look also at QUIC: http://en.wikipedia.org/wiki/QUIC It's a protocol developed by Google to overcome some limitations of TCP. It uses UDP as an underlying protocol, but it offers you one or more "connections" (bidirectional streams) like TCP ones.

I think it could be useful as a more efficient alternative to TCP when you have a lot of streams that go from a device A to a device B, that must transfer data concurrently. To do this without QUIC, you can either:

use a TCP connection for each stream: this is generally inefficient.
use a protocol that multiplex all streams in a single TCP connection: this is what HTTP/2 does, and it's generally inefficient because of head-of-line blocking of TCP. This means that if a packet of just one stream get lost, all other streams are blocked until the packet get retransmitted. This is one of the main reasons why Google developed QUIC.

QUIC also have some nice features like "connection migration": if you have an active QUIC connection between an Android phone connected with Wifi and a server, and you suddenly go out of Wifi range, the QUIC connection will automagically switch to your new 3G/4G IP address without breaking, like TCP connection do.

The QUIC protocol is included in Chrome and it seems to be used by default when connecting to Google websites, so I think it's reliable and battle-tested. But unfortunately it doesn't seems to exist any Java based implementation...

danarmak commented 9 years ago

@mattibal, If we make the RS.io protocol transport-agnostic, it will be able to run over QUIC and any other bidi full duplex byte stream. But that wouldn't be very useful, because we wouldn't be taking advantage of the transport's extra features. Like built-in multiplexing, or the way a UDP-based protocol doesn't have 'connection reset by network middleware' problems like TCP does, so the transport provides the equivalent of connection resuming.

On the other hand, the protocol specification becomes more complex. We'd have to specify the mapping for each transport, and there are many potential non-TCP transports, and none of them seem to me to be greatly more important than the others. Just for multiplexing, there's QUIC, SCTP, maybe SSH, possibly HTTP/2. Implementations would likely also end up being modular and so somewhat more complex.

It's a tradeoff and we need some idea of whether people will actually use the non-transport-agnostic versions, and how much they will gain from it.

tmontgomery commented 9 years ago

I mentioned in #1 that I think it might be good to consider the underlying transport only supporting simplex operation. The primary motivation being breadth of options. We can use an UP and DOWN session as needed for the underlying transport protocol. So, I urge we consider that if possible.

@danarmak sorry for the delay, but let me answer your question about framing. Yes, we should specify framing at the RS.io level. But we don't need a byte-stream. Some protocols do not provide a byte stream anyway. TCP does, though. But most protocols have to work around the byte stream abstraction to reimpose framing for data.

Due to wanting to use multiple transport protocols, I don't think we will have the luxury of NOT specifying the mapping onto various transport protocols. It will have to be done anyway for protocols we want. Some of those mappings, like TCP and WebSocket, will be (or should be) trivial in nature. Others more... "interesting", like HTTP/2, etc.

The following things I see as RS.io needing to take control of:

data framing (but maybe NOT data layout)
(de)muxing
control framing (e.g. request(N), etc.)

While utilizing transport capabilities like SCTPs channels, etc. would be great, I don't see how we can do that with breadth.

danarmak commented 9 years ago

@tmontgomery I agree; RS.io can run on top of a frame-based transport as well as a bytestream, especially if the underlying frames don't have to match the RS.io logical messages.

Would you prefer to continue discussing the transport feature requirements here or in #1, or maybe we should create a new ticket?

tmontgomery commented 9 years ago

@danarmak I'm fine either way, really.

viktorklang commented 9 years ago

The following things I see as RS.io needing to take control of: data framing (but maybe NOT data layout) (de)muxing control framing (e.g. request(N), etc.)

Now we're talking!

reactive-streams / reactive-streams-io

Use Case Definitions #3

Model 1

Model 2