reactive-ipc / reactive-ipc-jvm

Reactive IPC for the JVM
Apache License 2.0
55 stars 13 forks source link

What types must be read and written from a `TcpConnection`? #27

Open NiteshKant opened 9 years ago

NiteshKant commented 9 years ago

This came out of the discussion in #21 in this comment, specifically:

My expectation is that the ReactiveIPC layer is doing the minimal possible and its input and it works with byte buffers and object conversion is left as a separate concern.

This issue intends to discuss at what level does a TcpConnection work on:

  1. Read & write only Bytes
  2. Read & write custom user objects.

In case we choose to go with 2 above, then we need to define semantics around how to convert a user defined object into bytes.

rstoyanchev commented 9 years ago

Is this not also related to #23 where an intercept mechanism is expected to be able to change the type of input and output? Likewise #10 that discusses how to deal with transport-specific byte buffer abstractions?

NiteshKant commented 9 years ago

@rstoyanchev certainly interceptors would be a way to convert the types written on the connection. I am unsure if we need a specific "codec" abstraction or not, we can discuss that sometime too.

10 is related if we say that Buffer is the way to write bytes on the TcpConnection.

I created this issue specifically to discuss the type signatures of a TcpConnection originating from your comment on #21. If we are saying that a TcpConnection only handles bytes and the only way we can write bytes is via a Buffer then the TcpConnection type signature would look like:

public interface Connection<Buffer, Buffer> extends Publisher<Buffer> {
}

OTOH, if we say that one can write any object on the Connection then it would be like:

public interface Connection<R, W> extends Publisher<R> {
}

With writing any type on the connection, we would then have to discuss how does one configure a connection for those types, i.e., via interceptors, codec, etc.

jbrisbin commented 9 years ago

I think we need a "codec", we just don't have to call it that. A codec is, after all, simply an asymmetric pair of transformers. We can specify that by saying .intercept(Interceptor<R,NEWR>, Interceptor<W,NEWW>)

I think a Buffer is, paradoxically, more flexible than a raw Publisher because it gives a logical extension point to all manner of things that aren't codified by an RS Publisher (which is completely opaque). It is not simply a ByteBuffer clone. A Buffer could be a Buffer<ByteBuffer> or a Buffer<ByteBuf> or a Buffer<Publisher<B>> or a Buffer<ZFrame> in the ZeroMQ case. Its append methods are not strongly-typed so you can append(Message) if you want. A Buffer<B> could be a Publisher<B> as well, which allows us to provide all kinds of useful allocation abstractions for code that needs to create new underlying types like ByteBuf and ZMsg without having to directly deal with the transport's allocation APIs. We can't assume that all code outside the transport layer has easy access to creating new versions of objects to be written, especially if there are any generic third-party components at the protocol layer.

I feel like Buffer also makes conversion easier to deal with in the abstract.

I'm less a fan of the connection directly extending a Publisher<R> than I was. I think it's actually clearer to deal with a Buffer<R> (which extends Publisher<R>) rather than a connection which deals directly with R's.

rstoyanchev commented 9 years ago

We are discussing some form of interception under #23 that can transform the connection, so it makes sense for TcpConnection and TcpHandler to be generic. The question is what comes in and out of the transport (essentially the concrete types on the first handler) and how much does the transport know about the types of Objects that pass through?

Currently the transport casts to the generic type of the first handler. On the writing side it simply writes to the channel. That means whatever the first handler expects should be a ByteBuf or there must be a Netty codec in place.

This might be just fine for TCP. However for other protocols the transport will play a role of translating to and from Netty types. In HTTP for example turning Netty's HttpRequest to a ReactiveIPC HttpRequest, taking Netty's HttpContent and publishing bytes, etc. Same for Memcached and other protocols.

In general I see the transport layer as serving the role of a boundary between Netty types and ReactiveIPC types, performing translation as needed rather than being completely passive and letting everything through. That said we shouldn’t impose restrictions without good reason.

What may qualify as good reason is the impact on back-pressure. It is easy to reason about back-pressure when dealing with bytes. I’ve experimented with simple read and write back-pressure and it’s clear that if Netty codecs are in use, it becomes a very different situation, one that I’m not sure the transport can effectively deal with especially since it has no knowledge of the objects going in and out.

https://github.com/rstoyanchev/reactive-ipc-jvm/tree/tcp-poc

For example when autoRead=false we can call channel.read() once per request(1). When reading bytes that will result in one channelRead event with a ByteBuf no larger than the size of the input buffer. This is simple and predictable. If a codec is involved, a single call to read() may not result in a channelRead event since there may not be enough data. We can continue to call read() at that point until we get enough to create an object but then it’s much harder to define what request(1) means. Similarly on the writing side if producing bytes it’s easier to define what request(1) means and what’s expected from each Publisher. If the we get Objects from a Publisher and pass those down to Netty codecs we have no idea how much data will actually be written.

All in all while the TcpConnection and TcpHandler should remain generic, I think there are good reasons to draw a clear line at the transport layer and make it work explicitly with bytes when it comes to TCP and that will work well for other protocols too (e.g. the body of the HTTP request and response). Transformation to other types is then left strictly as an exercise for the interception chain. I can imagine adapting Netty codecs to RS Publisher/Processor contracts and vice versa so that the use of Netty codecs is still possible and perhaps such adapters can also provide support for buffering and generally assist in dealing with back-pressure.

NiteshKant commented 9 years ago

@rstoyanchev your conclusion here seems to be coming from the fact that it is unclear how back-pressure will be implemented in presence of codecs in netty's pipeline. If so then let me concretize the back-pressure implementation in my proposal and then we can discuss whether back-pressure is a reason to impose the restriction that type conversions must only be done by interceptors.

Personally, I feel that imposing this restriction isn't helpful and forces users to create these "pass-through" adapters to netty codecs. I would like to allow people to add netty codecs as-is when using netty transport and also for us to be able to write codecs independent of netty's contracts. Finding a common path IMO creates a false abstraction.

rstoyanchev commented 9 years ago

@NiteshKant by all means I believe back-pressure support, even in the most basic form, is essential for this design discussion.

Note that I’m not simply saying it’s unclear how back-pressure could work with Netty codecs. And by no means am I looking for a transport-neutral option, actually from that perspective we could just let Objects pass through and then it's a choice. It’d be hard to justify restricting use of codecs simply because some might choose to do that (some might not). It has to come down to better reasons than that.

I would very much like to see your ideas about how the transport can deal with back-pressure when Objects are serialized and deserialized on the Netty side. It’s something we have to discuss anyway for #25 and #28 and it would help move this discussion forward.

For now I’ll just say that since we are creating a reactive model on top of transports like Netty it’s fair to ask what the best way and place (or places) is in this new model to deal with back-pressure in the context of network I/O where we go from bytes to Objects and vice versa? What’s the relationship between encoding/decoding and back-pressure? Is it something we need to help solve as a problem once, or can it be treated as a black box with regards to back-pressure calculations?

Clearly we need more experimenting. Even if the transport was to deal with bytes only, at some point encoding and decoding still needs to be done, and at that point back-pressure calculations change. My point is that what works naturally when using Netty directly may not be what works best with RS data pipelines or may lead to design trade-offs.