Add faillible synchronous open_uni & open_bi methods

jean-airoldie commented 7 months ago

A faillible synchronous method is quite useful when trying to test rate limiting or when you know that you won't hit the rate limit. Currently this can be done in the following manner:

let send = {
    let fut = self.connection.open_uni();
    pin_mut!(fut);
    match fut.poll(&mut cx) {
        Poll::Pending | Poll::Ready(Err(err)) => todo!("error handling"),
        Poll::Ready(Ok(send)) => send,
    }
};

However, with this method, on Poll::Pending the noop waker is registered, which is wastefull.

An alternative solution would be to add a separate try_open method to the open futures which fail if the connection cannot be opened instantly. However I am uncertain whether this is a use case that is common enough to justify the additional API surface.

Ralith commented 7 months ago

with this method, on Poll::Pending the noop waker is registered, which is wastefull.

That cost can be avoided by passing in a no-op waker (coming to std soon). See also futures::FutureExt::now_or_never.

Neither tests nor code which is guaranteed not to hit the limit would be affected by that marginal cost, anyway. In sum, I don't see a strong case for expanding our API to expose this directly.

jean-airoldie commented 7 months ago

That cost can be avoided by passing in a no-op waker (coming to std soon). See also futures::FutureExt::now_or_never.

Yeah that's what I'm doing currently. I just looks so dumb.

Neither tests nor code which is guaranteed not to hit the limit would be affected by that marginal cost, anyway.

Yeah its only expensive if you hit the limit and you register the waker when you didn't need it then you drop the tokio::sync::futures::Notified which becomes expensive.

If you think this is too niche, then I can live without it.

Ralith commented 7 months ago

I'd be willing to consider this if you can show a significant benefit in a real-world application, but as a hypothetical I don't think it's justified, particularly as the same case could be made for large swathes of the interface. For readability, I suggest using (or copying) now_or_never.

jean-airoldie commented 7 months ago

For readability, I suggest using (or copying) now_or_never.

I'm not talking about aesthetics, I'm not a VIM golfer. I don't like the idea of having to use a complex interface for a strictly simpler use case because it makes it hard to reason about performance. I'll give an example.

So I use quinn within a higher level networking lib where I use quinn streams as envelopes for messages. When I send a message I check some outgoing rate limiter and then I open a stream and I let the actual transmission occur in a background task running on some green thread. I return the user some future that tells him if the message was transmitted or if it error'ed.

I provide two methods, send_msg and try_send_msg which are the sync and async variants of the routine I described. One example where I use the try_send_msg synchronous variant is in a publisher server in a fanout method. The semantics I'm looking for in that case is something like try to queue this message instantly otherwise don't bother because the peer is falling behind so I'm gonna close his connection or at least drop this message. This doesn't work with the noop_waker method because now I could reasonably hit the cold path if the peer is too slow. To know exactly how slow I have to know the internals of quinn futures, which are subject to change on a per version basis.

I'd be willing to consider this if you can show a significant benefit in a real-world application

I could write benchmark for my use case where I compare using a noop_waker vs this sync method.

jean-airoldie commented 7 months ago

But understand that if you think this is too niche like you said, I don't mind maintaining my own fork. I always have weird use cases so I'm used to it.

Ralith commented 7 months ago

The semantics I'm looking for in that case is something like try to queue this message instantly otherwise don't bother because the peer is falling behind so I'm gonna close his connection or at least drop this message.

You could accomplish this without any waker shenanigans or new APIs by allowing at most one stream open operation to be in flight at a time, and judging the peer to be behind if a second would be required. You could do so without potentially compromising delivery latency by keeping a fresh stream open preemptively.

It sounds like this is a slow path anyway, though.

I could write benchmark for my use case where I compare using a noop_waker vs this sync method.

That would help motivate the discussion, if you want to pursue its inclusion.

Another API design to consider, especially if there's a measurable performance win to be had, would be to expose a getter for the number of streams that may be opened immediately. That simultaneously avoids duplicating APIs and provides additional insight into connection state which might be interesting to monitor.

jean-airoldie commented 7 months ago

You could accomplish this without any waker shenanigans or new APIs by allowing at most one stream open operation to be in flight at a time, and judging the peer to be behind if a second would be required. You could do so without potentially compromising delivery latency by keeping a fresh stream open preemptively.

The problem with this approach is that if the peer experiences a single latency spike he might get dropped, whereas if you allow as many in-flight streams as the connection is configured to support, you add some leeway to the peer.

Another API design to consider, especially if there's a measurable performance win to be had, would be to expose a getter for the number of streams that may be opened immediately. That simultaneously avoids duplicating APIs and provides additional insight into connection state which might be interesting to monitor.

The problem with this approach would be that you could potentially have false positives due to a race between multiple threads using the same connection, although that wouldn't be a problem in the use case I laid out. On another note, In the future I would like to be able to have access to various metrics related to quinn's inner state so that they could be monitored in real time (such as per connection ping, read/write throughtput, number of open streams, etc.).

Ralith commented 7 months ago

The problem with this approach is that if the peer experiences a single latency spike he might get dropped, whereas if you allow as many in-flight streams as the connection is configured to support, you add some leeway to the peer.

You can have as many in-flight streams as you like regardless. The approach described above is to attempt to open at most one stream at a time, i.e. a single pending open_bi and a single open_uni future.

you could potentially have false positives due to a race between multiple threads using the same connection

If at most N threads may open a stream simultaneously, ensure that at least N streams may be opened.

I would like to be able to have access to various metrics related to quinn's inner state so that they could be monitored in real time (such as per connection ping, read/write throughtput, number of open streams, etc.).

Connection::rtt is already exposed. Feel free to open separate issues for other getters you're interested in, including discussion of how they might be useful.

jean-airoldie commented 7 months ago

You can have as many in-flight streams as you like regardless. The approach described above is to attempt to open at most one stream at a time, i.e. a single pending open_bi and a single open_uni future.

Kk, I misunderstood. But that just means if the previous open call hit the cold path then you know you hit the cold path, not perticularly usefull.

If at most N threads may open a stream simultaneously, ensure that at least N streams may be opened.

That's starting to sound impractical lol.

I'll write a benchmark instead of bothering you with my hypotheticals.

jean-airoldie commented 7 months ago

Alright so I did write a benchmark for my closed source library and here are the results and methodology.

Methodology

I'm using criterion with a measurement time of 240seconds, because there's a lot of noise. I'm benchmarking open_uni call which returns a SendStream that is then put in a task that is then spawned in a green-thread pool, but we simply drop the task because we're not interested in actually transmitting anything on the stream. I call open_uni until it doesn't resolve instantly, meaning the queue is full. This always happens after opening max_concurrent_uni_streams streams. I create a new connection after each iteration, and that connection process is not part of the measurement.

I'm testing with two different max_concurrent_uni_streams to see if it changes anything.

Results

These are the criterion results. try_open means I'm using the synchronous try_open_uni method and the noop_waker, I'm using the current API. The results are displayed in [min, avg, max]. The throughtput is in number of times open_uni is called per second. The /100 or /10 refer to the size of the max_concurrent_uni_streams

try_open/100   [136.99 Kelem/s 137.20 Kelem/s 137.65 Kelem/s]
try_open/10:   [109.53 Kelem/s 112.62 Kelem/s 114.93 Kelem/s]

noop_waker/100 [133.77 Kelem/s 133.99 Kelem/s 134.33 Kelem/s]
noop_waker/10  [109.79 Kelem/s 121.97 Kelem/s 134.27 Kelem/s]

Conclusion

The synchronous API doesn't not seem to result in improved throughput.

jean-airoldie commented 7 months ago

Alright since I can't prove that this improves performance, I guess the noop waker will have to do.

quinn-rs / quinn