Flushing strategy - Githubissues

sdeleuze commented 6 years ago

Hi, I am starting adding support for Jetty ReactiveStream HttpClient in Spring Framework 5 as a second engine for our Reactive WebClient as an alternative to Reactor Netty (see SPR-15092 for more details).

As you can see in our ReactiveHttpOutputMessage interface, we have 2 flavors when writing the content : one where we let the underlying engine taking care of flushing when relevant, and one where we perform flushing explicitly.

Is there a way to control flushing with current API?

sbordet commented 6 years ago

@sdeleuze I'm confused.

Flushing explicitly may block because of TCP congestion. What's the point of providing a blocking behavior in a reactive API ?

I'm also surprised by Publisher<Publisher<DataBuffer>>. I'm interested in the details of the use case for it that cannot be done by a regular Publisher<DataBuffer>.

sdeleuze commented 6 years ago

Publisher<DataBuffer> splitting does not necessarily matches with flushing semantic. The way how DataBuffer are split in the Publisher<DataBuffer> depends on various element in the Reactive pipeline.

On server side, Tomcat, Undertow, Jetty or Netty will not provide the same chunk size for example. It also varies based on what codecs are used, content type, etc. Some codecs could serialize collections with one DataBuffer per element, and that does not means necessarily that we want a flush per element, usually not I would prefer let the underlying HTTP engine decide what flushing strategy to use (for example flush every 64 KBytes). For streaming use case, it is different, I want to be able to signal explicitly flush signal.

With only Publisher<DataBuffer>, if one want to flush explicitly every 10 Mbytes, it is required to put 10 Mbytes element in memory, which is a not what we want. Time based flushing strategy are also a use case. Publisher<Publisher<DataBuffer>> is much more flexible, and allow to provide flushing signalization without introducing new types artificially while still allowing the underlying engine to flush because its buffer is full for example.

Explicit flushing signal does not mean the underlying engine has to block, it is more a way to provide contextual flushing information in addition to the one configured on the HTTP (which is usually size-based). It means "flush asap" not "flush now".

Does it make sense for you? Does Jetty ReactiveStream HttpClient flushes after each chunk?

sbordet commented 6 years ago

@sdeleuze I am not sure I understand.

If I have a reactive pipeline of say A -> B -> C, whether A wants to flush should be irrelevant to B and C.

I would have expected that the policies about when and how to flush would depend on each component, and configured so separately, so that I can say that A flushes every 64 KiB, B flushes every 2 seconds, and C flushes every N elements. That A says to B "flush" would be contrary to the policy of B.

As an example, if A is generating text that is easily compressible and B is a compressing component, that A says "flush" every 64 KiB to B is unnecessary and possibly very inefficient (e.g. B can generate only few bytes from the 64 KiB received from A).

For Jetty's HttpClient, being A what is provided by user code, how data coming from A is written out to the network depend on the particular transport used. It is different if it is HTTP/1.1, HTTP/2 or FastCGI, because each protocol has different requirements, different configurations and different mechanics (e.g. HTTP/2 has flow control). It also changes based on implementation details that may vary from release to release (e.g. imagine a change in the defaults for the aggregating content output buffer).

The semantic of "flush asap" is undefined, because "soon" is not clearly defined.

In summary, no, there is currently no API in Jetty's HttpClient to "flush" explicitly.

Whether content chunks are flushed one after the other depends on the transport used. It is currently the case for HTTP/1.1 (but we may change the implementation), and it's currently undefined for HTTP/2 because it depends on the number of concurrent requests and on the HTTP/2 flow control status for that stream and that session.

I guess you can implement writeAndFlushWith() by flatmapping the Publisher<Publisher<DataBuffer>>.

Let me know if that works for you.

Thanks !

sdeleuze commented 6 years ago

What I mean by "flush asap" is that this is not blocking or real time, that's an indication from user side that data should be flushed other the network when possible.

Let's talk about more concrete use cases, maybe the examples I used previously was not concrete enough. We are using the explicit flush API to support SSE and application/stream+json in order to request a flush after each element in order to avoid data keep being buffered and not received on the other side.

I understand that protocol or TCP congestion may impact (defer) data flushing, but I have hard time to understand how a streaming oriented HTTP client could support SSE or application/stream+json without providing such feature.

Any thoughts?

sbordet commented 6 years ago

Our experience with Jetty and the CometD project is that techniques such as infinite response, SSE or similar do not work in general over the Internet because intermediaries are rarely updated and do not understand these techniques (i.e. the intermediary does not behave any different if Content-Type: text/event-stream).

Add on top of that the great push towards encryption of web traffic, and now intermediaries won't be able to distinguish between a normal response and a SSE response, so they will just apply buffering without any difference between the two, as they see fit. We have seen the same issues with encrypted WebSocket.

With HTTP/2 and a lot of concurrent requests that all want to send content in a Publisher<Publisher<DataBuffer>> format, the implementation cannot possibly respect the flushing semantic that the API carries, because of flow control, frame interleaving, etc.

Never been a fan of SSE despite having written one of the earliest implementations back in 2011.

sdeleuze commented 6 years ago

What about Servlet 3.1 non-blocking I/O API that supports flushing? Is there something particular on client side that makes this relevant to be supported on server side but not on client side?

sbordet commented 6 years ago

@sdeleuze I'll have to treat you to dinner and many beers after dinner if you want to discuss Servlet 3.1 non-blocking I/O design :smile:

I'm still confused. Flush primitives are needed on server for SSE, not on the client. Do you also have streaming requests ?

sdeleuze commented 6 years ago

Fair point for Servlet 3.1 non-blocking I/O design :wink:

Yes we have streaming requests as well. We allow to request SSE endpoints from a Java client mainly for testing purpose. But for server to server over HTTP/1.1 and HTTP/2, we support application/stream+json, application/stream+x-jackson-smile (binary JSON) and Protobuf streaming support is coming.

sbordet commented 6 years ago

Still confused.

From a Publisher<Publisher<DataBuffer>> what would you like the implementation to do ? You want the implementation to buffer the nested Publisher, and upon its complete event you want to force the implementation to flush ?

If that's the case, isn't this doable with a transformation of the nested Publisher to buffer the data, and then provide just a Publisher<DataBuffer> to the implementation ?

Or you would like to configure the implementation with an output buffer size, so that it will buffer up to that size and then flush independently from the fact that the nested Publisher is complete ?

To sum up: there is no buffering in Jetty's HttpClient implementation. You give it a buffer, it writes it. Because there is no buffering, there is no flush primitive. The only flush primitive available is "wait for the last write to complete" (e.g. in case of TCP congestion).

Your case is then doable in 2 ways: either you transform Publisher<Publisher<DataBuffer>> into Publisher<DataBuffer> by buffering the nested Publisher, or you don't care about flushing, you flatMap the Publisher<Publisher<<DataBuffer>> and each element will be written. Alternatively, Jetty can provide a utility class BufferingContentProvider that always buffers until it's flushed or closed - similar to the first solution but with the utility class provided by Jetty rather than Spring.

sdeleuze commented 6 years ago

Ok I understand better, if there is no buffering in Jetty's HttpClient, I think we are fine with current API. Thanks for your feedback!

gregw commented 6 years ago

@sdeleuze Note that a flush on serverside is provided more for committing the response rather than giving the ability to create some kind of boundary within the content. Once a flush is done, the response headers are set with the type of content boundary to be used (eg chunking vs content-length). So the first flush can be very semantically meaningful. After than, flush is at best a hint to the server to try get the data onto the wire.

jetty-project / jetty-reactive-httpclient

Flushing strategy #1