python-trio / trio

Trio – a friendly Python library for async concurrency and I/O
https://trio.readthedocs.io
Other
6.22k stars 344 forks source link

When building Channels on top of a bidirectional Stream, how should we handle closure? #895

Open oremanj opened 5 years ago

oremanj commented 5 years ago

I came across a slightly tricky implication of the structure of Trio's stream ABCs today.

Suppose I'm implementing a channel that sends objects over a byte stream. Since I'd like to be able to use it on a unidirectional transport, for maximum flexibility I'll define StreamSendChannel to wrap a SendStream, and StreamReceiveChannel to wrap a ReceiveStream. All good so far.

Now I want to use these stream channels to send objects in both directions on a bidirectional transport that I have a Stream for. The subtyping relationships say any Stream is-a SendStream and also is-a ReceiveStream, so it looks like I can do this by wrapping the same Stream in two channels, one for sending and one for receiving. This works fine as long as the channel stays open, but since the two channels are independent, they can close independently, and then they clash over the calls they want to make to the stream's aclose(). The problem is that Stream.aclose() doesn't start meaning only SendStream.aclose() if you interpret the Stream as a SendStream -- it closes the receive side too.

I think this might be less confusing if Stream did not inherit SendStream and ReceiveStream. It could still define the same methods, and we could provide a helper for turning a Stream into a SendStream plus a ReceiveStream, that encapsulates the logic of:

This helper doesn't necessarily need to be provided by trio, and would be useful even if we don't change the inheritance hierarchy. I think it's probably not worth including in trio unless we change the inheritance hierarchy, so this issue is mostly about that question.

njsmith commented 5 years ago

Ah ha! You have discovered why v0.2.0 was delayed so long :-).

I struggled this question a lot I was designing the abstract interfaces, and I think Trio's approach is the best available, but it does have trade-offs. So, the reason we have the hierarchy AsyncResource → SendStream, ReceiveStream → Stream is exactly so we can properly model this closure issue you ran into. For most actual streams (in particular TCP and TLS, which are overwhelmingly the most important), sending and receiving are independent operations, but closure affects both halves. So that's exactly what our interface hierarchy says.

A Stream is a subclass of ReceiveStream and SendStream in the Liskov sense – you can use it as either. But you're right, you can't use it as both :-). But the hierarchy doesn't claim otherwise. Imagine if you had an ordered-dict class that implemented both "mutable sequence" and "mutable mapping" interfaces. You wouldn't expect that you could "split the dict in half" and mutate it using the sequence API in one function while mutating it using the mapping API in another function, without them interfering with each other. But with streams it feels weirder because they're almost two separate halves... but they just... aren't.

We already have a way to describe an object that really is an independent SendStream and ReceiveStream.... it's a StapledStream :-). (I guess I should link to this subthread about shutdown_read here too, because it's relevant – in particular I described why shutdown_read isn't part of Stream.)

(Interesting trivia: in the asyncio streams interface, if you have a bidi stream it's always represented as two objects, a StreamReader and StreamWriter. So how do they handle the shared closure state? StreamReader simply has no close method; StreamWriter.close closes both of them. It's... not how I would have done it.)

Anyway... that's all background, I think, and the real question here is what to do with channels-on-top-of-streams. It's a really interesting question! Most protocols you build on top of streams are either intrinsically unidirectional or intrinsically bidirectional... channels are an interesting example where you have the same set of unidirectional/bidirectional options at the public API as you do on the transport layer, so you have to figure out how the sort-of-but-not-entirely-separate semantics of stream relate to that.

Given the annoying fact that common transports don't have unidirectional close, it's not actually obvious to me that the best way to build a bidi Channel on top of a Stream (say, TCP), is to implement a SendChannel on top of a SendStream, and a ReceiveChannel on top of a ReceiveStream, and then staple them together. Of course makes sense to have unidirectional SendChannel that works over a SendStream or Stream, and to have a ReceiveChannel that works over a ReceiveStream or Stream, respectively, but trying to get both of them working over the same Stream at the same time seems messy.

I feel like what you might actually want to do is implement a single bidi protocol over a bidi Stream, and then expose it in one of two ways:

A piece of hard-won meta-advice: remember that the goal isn't to provide every mathematically plausible building block and allow fully arbitrary mixing-and-matching (as tempting as that is!); it's to provide a usable set of tools that people can actually understand and that are well-matched to the problems they encounter in practice. Does anyone actually need to multiplex a fully-independent SendChannel and ReceiveChannel over the same Stream? It seems like most people could either get by with not-quite-independent channels, or else use two streams.

njsmith commented 5 years ago

Looked at this again today while writing #1110.... is there anything still to discuss here or can it be closed?