Closed bentoi closed 1 year ago
It sounds like a worthwhile optimization for the common case of small RPCs.
Could we just use 2 frames instead of 3?
With this proposal, we'd have:
It seems to me:
would be sufficient.
In particular for unidirectional streams, StreamComplete(s) would replace StreamReadsCompleted, where the reader tells the writer "I'm done".
There's also the following scenario: I'm done with reading (please stop sending) but I still have something to send you. With this scenario, the remote stream needs to be able to send StreamReadsCompleted
before sending the data, right?
With Quic, I think this is handled with the MAX_STREAMS frame. If initial_max_streams_bidi is set to 1 and the client creates a bi-dir stream, it won't be able to open a new bi-dir stream until the peers sends a MAX_STREAMS(MaxStreams=1)
frame to allow the client to open a new stream.
The server could actually send this frame even if the first stream is still opened.
So with Quic:
STREAM_LAST
frame)STREAM_LAST
frame)STREAM_LAST
frame)STREAM_LAST
frame)MAX_STREAM(MaxStreams=1)
to allow the client to open a new streamWe'd have to check this with Wireshark... If that's how it works, the question will be whether or not we should do the same with Slic. A two-way icerpc request would still require 3 frames here (2 x STREAM_LAST
+ 1 MAX_STREAMS
).
The question is whether or not this has a significant impact on icerpc latency.
There's also the following scenario: I'm done with reading (please stop sending) but I still have something to send you. With this scenario, the remote stream needs to be able to send
StreamReadsCompleted
before sending the data, right?
Yes, you're right.
The question is whether or not this has a significant impact on icerpc latency.
You mean if sending 3 frames instead of 2 has a significant impact on latency? I bet it does.
server sends MAX_STREAM(MaxStreams=1) to allow the client to open a new stream
When does the server decide to send this frame? It doesn't know when the client reads the STREAM_LAST frame it sent, so arguably it could "bundle" it with the STREAM_LAST frame, which is in a way what we're talking about here with this StreamLastAndReadCompleted frame.
An alternative to this new StreamLastAndReadCompleted frame could be to send both StreamLast and StreamReadsCompleted together in a single write.
The logic being: when reading StreamLast from a bidir stream that is still writable, defer the sending of StreamReadsCompleted until the next write--and bundle it with the next Stream/StreamLast frame.
When does the server decide to send this frame? It doesn't know when the client reads the STREAM_LAST frame it sent, so arguably it could "bundle" it with the STREAM_LAST frame
Why would it matter to the server that the client read or not the STREAM_LAST
frame? I expect a well behaved Quic implementation sends the MAX_STREAMS
frame once the both the remote stream reads and writes are closed and regardless of the local stream state.
See also the stream states here. "Reads closed" = the "DataRead" state, "Writes closed" = the "DataRcvd" state.
The remote stream should consider that reads are closed only once all the buffered has been consumed by the application (DataRead
state). It shouldn't consider reads closed when it received the STREAM_LAST
frame (DataRcvd
state).
I suspect it's the issue with .NET Quic (or msquic). It sends the MAX_STREAMS
frame too soon, before the data buffered on the remote stream is consumed by the application.
We'll see what Microsoft has to say, it's also quite possible I'm missing something.
If we want to avoid the extra frame, the StreamLastReadCompleted
frame sounds simpler to me.
and bundle it with the next Stream/StreamLast frame.
One potential advantage over the StreamLastReadCompleted
(other than one fewer frame type) is we'd communicate StreamReadsCompleted
earlier in the event the server keeps writing for a while.
Question: we currently have this exchange StreamLast -> StreamReadsCompleted <- StreamLast <-
The client doesn't need to send a StreamReadsCompleted to the server: StreamReadsCompleted ->
because it's the "accepter" (the server here) that flow-controls the streams?
And here, as soon as the accepter sends StreamLast
, it can dispose the stream?
Let's use "local stream" for the stream created by the client and "remote stream" for the stream accepted by the server.
Yes, the local stream never sends StreamReadsCompleted
, see https://github.com/icerpc/icerpc-csharp/blob/8262c7083a59411234a9824fde3cc5f47d724130/src/IceRpc/Transports/Slic/Internal/SlicStream.cs#L149
because it's the "accepter" (the server here) that flow-controls the streams?
Yes, it's the server that decides when the client can create a new stream.
The remote stream reads and writes are closed when the remote stream sent respectively StreamReadsCompleted
and StreamLast
.
When the local stream receives these two frames, the connection bidirectional stream semaphore is released to unblock a CreateStreamAsync
call.
Here are the results with the addition of a StreamLastAndReadsCompleted
frame. The results vary from different runs but overall the results follow this trend. I don't understand why the task count 16 and 64 benchmarks are slower with the change.
Method | Job | Protocol | Mean | Error | StdDev | Median | Op/s | Ratio |
---|---|---|---|---|---|---|---|---|
InvokeLatency | 0.1.0 | icerpc | 141.51 us | 1.932 us | 1.807 us | 141.71 us | 7,066.7 | baseline |
InvokeLatency | 0.1.1-slicfix | icerpc | 98.13 us | 0.397 us | 0.352 us | 98.11 us | 10,190.9 | 1.44x faster |
InvokeRPSWithTaskCount1 | 0.1.0 | icerpc | 129.57 us | 2.094 us | 2.722 us | 128.80 us | 7,717.8 | baseline |
InvokeRPSWithTaskCount1 | 0.1.1-slicfix | icerpc | 99.51 us | 1.952 us | 2.324 us | 99.20 us | 10,049.3 | 1.30x faster |
InvokeRPSWithTaskCount4 | 0.1.0 | icerpc | 62.18 us | 0.902 us | 0.844 us | 62.49 us | 16,082.0 | baseline |
InvokeRPSWithTaskCount4 | 0.1.1-slicfix | icerpc | 56.74 us | 0.763 us | 0.713 us | 56.82 us | 17,622.8 | 1.10x faster |
InvokeRPSWithTaskCount16 | 0.1.0 | icerpc | 49.81 us | 0.955 us | 2.516 us | 48.93 us | 20,076.2 | baseline |
InvokeRPSWithTaskCount16 | 0.1.1-slicfix | icerpc | 61.14 us | 1.198 us | 1.935 us | 61.64 us | 16,355.6 | 1.20x slower |
InvokeRPSWithTaskCount64 | 0.1.0 | icerpc | 50.80 us | 1.347 us | 3.971 us | 49.73 us | 19,684.9 | baseline |
InvokeRPSWithTaskCount64 | 0.1.1-slicfix | icerpc | 57.00 us | 1.390 us | 4.098 us | 56.69 us | 17,542.7 | 1.13x slower |
gRPC vs IceRPC w/ Slic fix
Method | Benchmark | Mean | Error | StdDev | Op/s |
---|---|---|---|---|---|
PingLatency | IceRPC(icerpc,tcp) | 101.63 us | 1.978 us | 1.850 us | 9,839.2 |
PingLatency | gRPC(http/2,tcp) | 147.01 us | 1.438 us | 1.274 us | 6,802.1 |
PingRPSWithTaskCount1 | IceRPC(icerpc,tcp) | 95.83 us | 1.214 us | 1.136 us | 10,435.6 |
PingRPSWithTaskCount4 | IceRPC(icerpc,tcp) | 55.35 us | 0.955 us | 0.893 us | 18,066.7 |
PingRPSWithTaskCount16 | IceRPC(icerpc,tcp) | 55.79 us | 1.053 us | 1.253 us | 17,923.0 |
PingRPSWithTaskCount64 | IceRPC(icerpc,tcp) | 52.35 us | 1.009 us | 0.894 us | 19,101.0 |
PingRPSWithTaskCount1 | gRPC(http/2,tcp) | 136.44 us | 0.846 us | 0.791 us | 7,329.2 |
PingRPSWithTaskCount4 | gRPC(http/2,tcp) | 53.42 us | 0.957 us | 1.278 us | 18,720.4 |
PingRPSWithTaskCount16 | gRPC(http/2,tcp) | 20.39 us | 0.402 us | 0.376 us | 49,046.8 |
PingRPSWithTaskCount64 | gRPC(http/2,tcp) | 13.52 us | 0.267 us | 0.262 us | 73,944.5 |
For bidirectional streams, the remote stream always sends a
StreamReadCompleted
when reads are closed. Reads are considered closed when the data from the remote stream input has been consumed.Instead, we could look into not sending this frame in the scenario where writes are not closed.
The remote stream would send a new
StreamLastAndReadCompleted
frame instead of aStreamLast
frame. Upon receiving this frame, the local stream can consider the remote stream reads and writes as closed. As a consequence, the connection can release the bidirectional stream semaphore.