Make `quinn_proto::Connection` and `quinn_proto::Endpoint` impl `Sync`

Pixelstormer commented 4 months ago

The only things preventing these types from already being Sync were a lack of Sync bounds on trait objects they were storing, and a RefCell that was only being used to work around a limitation in the API of BinaryHeap.

Adding Sync bounds to the relevant traits was simple enough and didn't cause any warnings or errors, but to remove the RefCell it was necessary to rework the way that pending stream IDs were stored.

The specific motivation for this PR was so that I could store these types as components directly in the Bevy ECS, which requires component types be Send + Sync for multi-threaded system scheduling. The alternative would be to wrap the types in a Mutex or other synchronization primitive, but removing the limitation entirely seemed like the better option, given how the only non-trivial blocker was just a hacky workaround an API limitation.

The aforementioned rework to remove the RefCell also (IMO) simplifies the logic a fair bit, by utilizing the BinaryHeap's sorting for both priority and fairness. Sorting for stream IDs with the same priority falls back to a recency counter, which is decremented each time that stream is processed. This causes the streams of the same priority to be 'cycled' through, in the same way that was achieved previously by popping from the front and pushing to the back of a VecDeque that has now been removed. Technically the u64 could underflow and break the cycling, but that is highly unlikely to ever happen, due to the counter being initialized to u64::MAX and only being decremented by 1 at a time.

Ralith commented 3 months ago

The aforementioned rework to remove the RefCell also (IMO) simplifies the logic a fair bit, by utilizing the BinaryHeap's sorting for both priority and fairness. Sorting for stream IDs with the same priority falls back to a recency counter, which is decremented each time that stream is processed.

I'm strongly in favor of getting rid of the RefCell, but the mechanism described here is more complex than rotating through a VecDeque, and more costly in the common case where most streams have the same priority. Some possible alternatives:

Refactor the original design to store the VecDeques in a Slab on the side and refer to them by (immutable) index
Store a BTreeMap<i32, VecDeque<StreamId>> -- this is probably less efficient when many different priorities are used, but unless that's shown to be a bottleneck, it's pleasantly straightforward

Pixelstormer commented 3 months ago

When implementing this originally, I was fond of the idea of using the BinaryHeap's sorting for both priority and fairness, as well as reducing the number of collections that needed to be allocated, but I'll agree that it hasn't reduced complexity as much as I had hoped.

Refactor the original design to store the VecDeques in a Slab on the side and refer to them by (immutable) index

This would increase the number of layers of indirection to 3, needing to go through BinaryHeap -> Slab -> VecDeque to actually retrieve the StreamId. Of course, it's possible that this wouldn't actually impact performance that much, but regardless, it would still require a fair bit of complexity and bookkeeping.

Store a BTreeMap<i32, VecDeque> -- this is probably less efficient when many different priorities are used, but unless that's shown to be a bottleneck, it's pleasantly straightforward

In the common case of only having a few or even just 1 priority in use, this only has 2 layers of indirection, although this would increase with more priorities as the BTreeMap allocates more heap nodes. Again, and as you mentioned, this may not actually be a bottleneck, and it would also be a much simpler, and thus more maintainable, implementation.

I think that the latter alternative is better, as making the code simpler and more understandable would be very valuable, and it's not clear that it would even have a meaningful impact on performance - I know that I had some trouble understanding the original implementation because of the tricky bookkeeping for always keeping at least 1 empty VecDeque around.

Ralith commented 3 months ago

Yeah, I'm not happy with the existing code either. Proceeding with the BTreeMap approach sounds good to me; we can always revisit again in the future, and in the mean time it'll be simpler/more readable/not incur a silly !Sync.

Pixelstormer commented 3 months ago

So, I've decided to keep the current approach of removing the VecDeques and implementing fairness by including a recency value in the key, as the main source of complexity was really just all the bookkeeping needed to work around how limited BinaryHeap's API is. By comparison, BTreeMap crucially has a range method that does everything we want, with no need for any external bookkeeping. ~~There's also a nightly method, lower_bound, that would make it even more terse if it ever gets stabilized.~~

Edit: I thought about this some more, and lower_bound would introduce an edge case where it could return streams of a different priority, and I think handling this edge case would be more complexity than just explicitly specifying the full range.

I think this time round, it actually is much simpler than before, as this removes all of the aforementioned complexity working around BinaryHeap's limited API, which both the previous implementations ran into, while still removing all of the bookkeeping to do with managing the VecDeques. I'll admit that personally, I'm particularly averse to (the bookkeeping introduced by) the optimization of keeping 1 empty VecDeque around when there are no pending streams, as I think that was the hardest part of the original code to understand. Thus, I'm definitely biased towards wanting to remove the VecDeques, to eliminate that unintuitive logic and related bookkeeping.

the mechanism described here is more complex than rotating through a VecDeque

I don't think the idea of the recency value is actually any more complex that rotating a VecDeque, as fundamentally, they're both just doing the same thing - implementing round-robin scheduling. I'm hoping that this similarity in simplicity was just being hidden by the surrounding bookkeeping, and that is it now more apparent.

Pixelstormer commented 3 months ago

So, CI is failing because of the use of the pop_last method, which was stabilised in 1.66, while Quinn's current MSRV is 1.65. I don't know what Quinn's MSRV policy is, but given that it's only a different of a single version, there is an argument for bumping the MSRV for this, as the original motivation for adding pop_last and other related functions was that they are meaningfully faster than the old way of calling .iter().next()/.iter().next_back().

djc commented 2 months ago

So, CI is failing because of the use of the pop_last method, which was stabilised in 1.66, while Quinn's current MSRV is 1.65. I don't know what Quinn's MSRV policy is, but given that it's only a different of a single version, there is an argument for bumping the MSRV for this, as the original motivation for adding pop_last and other related functions was that they are meaningfully faster than the old way of calling .iter().next()/.iter().next_back().

I think bumping to 1.66 is probably fine at this point -- please either squash your other commits and add a separate commit for this or do this in a separate PR.

Ralith commented 2 months ago

I went ahead and opened https://github.com/quinn-rs/quinn/pull/1810 for the MSRV bump, since I was excited to get rid of some janky old BTreeMap iterator use.

Pixelstormer commented 2 months ago

Rebased ontop of #1810

Pixelstormer commented 2 months ago

It looks like #1810 was updated inbetween me rebasing onto it and it getting merged into main. Should be fixed now.

djc commented 2 months ago

Great -- will leave it to @Ralith to have another look before merging, thanks!

djc commented 2 months ago

Can you squash the new simplification commit into the first commit?

quinn-rs / quinn

Make `quinn_proto::Connection` and `quinn_proto::Endpoint` impl `Sync` #1769