Open alexshpilkin opened 5 years ago
This is hard to implement currently because Trio only has one kind of async file, and it delegates all operations to a thread, which isn't appropriate for a Trio socket (since the socket is using non-blocking mode). But once some more work happens on #174 and #219, I could see us winding up with multiple implementations of a single "async file interface", and I think it would make sense to support makefile()
by using one of those.
A question though: should it be a method on sockets, or should it be a method (or a free function) on Streams? Streams seems more broadly useful to me, since we'd get SSL support and such "for free".
@oremanj In my case, I’d definitely want it to be an operation on sockets, because then trio
and curio
implementations of async things could share code... But then I don’t quite see the point of streams anyway, as in I don’t understand their advantages over files (which are, after all, the standard stream abstraction).
So I can at least explain why Stream
isn't the same as files. The idea is that SendStream
and ReceiveStream
are minimal interfaces for streaming data. OTOH the Python file interface has a bunch of methods that don't make sense for streaming data (seek
, tell
), or that are complex higher-level operations that you don't want to have to reimplement from scratch on every stream class (readline
), or that are almost what you want but don't have quite the right semantics (read(n)
tries to return exactly the requested number of bytes, but that's the wrong primitive for streams).
readline
without having to reimplement them separately for each Stream
class.In an asynchronous world, (asynchronous versions of) APIs like json.dump() become much more useful and potentially uniform across file and network I/O. However, without makefile(), they would need to exist in a write version and a send version, which would be silly.
I think the "trio way" to think about this would be: conceptually json.dump
should work with any object that you can incrementally send bytes into. Therefore, it should be written against the SendStream
interface. And if you want to use that to write into a file, then you can make a SendStream
implementation that writes into a file, and pass that to json.dump
.
(To be clear: the "trio way" referred to here is something we're making up as we go, so this is tentative and subject to change if there are good counterarguments :-).)
I’d definitely want it to be an operation on sockets, because then trio and curio implementations of async things could share code...
Trio exposes the raw socket
API for those who really need to access low-level details, but you should almost never use it... when you do, you're stuck dealing with all the weird OS-specific quirks. Using SocketStream
lets Trio handle this stuff for you, plus lets your code work with other kinds of streams, like subprocess stdio and TLS encryption. It's unfortunate that Curio forces you to use the socket layer directly, but I don't think Trio should start wedging higher-level features into trio.socket
to work around Curio's limitations... The Trio abstract interfaces like Stream
are pretty general though, and not particularly tied to Trio. So I guess you could re-use them on Curio if you wanted?
@njsmith Huh. I think you sold me on streams. Yes, the API is unwieldy, and yes, I’ve been walking around thinking read()
is silly for quite some time (although my gripe is more with the unbuffered version). (I think that you need a read_at_least
thing as well, though, even if as a library function, because when you’re writing a parser, you really, really want it, which is why unbuffered read()
sucks.)
But in that worldview, makefile()
basically has no place in native Trio programs—those should use streams. Which means it remains largely as a compatibility feature, which means, again, that it should be a method on vanilla-Python–compatible sockets. And rewriting the client two times (sync and async) or three (sync, asyncio and Curio, Trio) is still a big deal, even if you rightly feel reluctant about baking in compatibility features into the library. So I rest my case regarding the place for makefile()
.
And I think it’s unfair to characterize the current Trio interface as a complete substitute for sockets; I have to do UDP sometimes, and broadcasts, and sometimes both, and sometimes I even want to connect()
and makefile()
a UDP socket to pass it to some code that handles TCP as well. I literally want to do each of these in the thing I’m writing now, if only in a small number of places.
Don’t really see how I could reuse Trio streams under Curio, though, aside from reimplementing them.
(I think that you need a read_at_least thing as well, though, even if as a library function, because when you’re writing a parser, you really, really want it, which is why unbuffered read() sucks.)
Yeah, this is part of #796 too. No one thinks Trio is complete and finished yet :-). But the thing that needs to be added is some kind of helper function or wrapper class, not a part of the Stream
in interface itself.
Which means it remains largely as a compatibility feature, which means, again, that it should be a method on vanilla-Python–compatible sockets. And rewriting the client two times (sync and async) or three (sync, asyncio and Curio, Trio) is still a big deal, even if you rightly feel reluctant about baking in compatibility features into the library.
Asyncio doesn't have makefile
either. And sync makefile
isn't compatible with async; the whole API is different. So I'm sympathetic to the compatibility issues; it's just not an easy problem. You certainly could implement a makefile
-like API for trio and asyncio and then use it as a compatibility layer. You could also implement a Stream
-like API for asyncio and curio. Both take some work; up to you what you think makes more sense for your situation, I guess.
Note that a simple (not necessarily the most efficient) implementation of read_exactly
is just
def read_exactly(stream, count):
buf = bytearray()
while len(buf) < count:
chunk = await stream.receive_some(count - len(buf))
if not chunk:
break
buf += chunk
return buf
And since it only uses method calls on the stream
, you can reuse that function for Stream
s implemented using any library.
And I think it’s unfair to characterize the current Trio interface as a complete substitute for sockets; I have to do UDP sometimes, and broadcasts, and sometimes both,
Sure, that's fair. I was just talking about SOCK_STREAM
sockets. Obviously the Stream
interface doesn't handle UDP, which isn't a Stream
:-). (And even for TCP we do allow you to drop down to the raw socket layer briefly in case you need to do a setsockopt
or something, while staying at the Stream
layer the rest of the time.)
sometimes I even want to connect() and makefile() a UDP socket to pass it to some code that handles TCP as well.
That's interesting – I don't know much about this! Can you say more? Don't you run into issues with the UDP-oblivious code messing up packet boundaries? Do any of the read*
methods work?
sometimes I even want to connect() and makefile() a UDP socket to pass it to some code that handles TCP as well.
That's interesting – I don't know much about this! Can you say more? Don't you run into issues with the UDP-oblivious code messing up packet boundaries? Do any of the read* methods work?
There’s nothing much in there. You can connect()
a UDP socket, which will do nothing on the wire, but will limit it to the specified peer address and port pair for send()
and recv()
. Then you can use send()
/recv()
or even read()
/write()
on it and it’ll do the right thing... Except it will still remain datagram-oriented, so a single datagram must correspond to a single call (and read()
calls will discard data if you’re not careful). It’s just that my sending code takes care to send each application-level message in a single write()
call.
So yes, you do run into issues with UDP-oblivious code, but if you control the code, you can avoid them. A smarter BufferedReader
/BufferedWriter
implementation could even do some shielding if it wanted to (always recv()
into a datagram-sized buffer, only send a datagram on a flush()
, etc.).
796 is also highly relevant here
I'm new to open source. Shall I submit a pull request of parsing utils as outlined in that issue in order to continue that discussion? It's more a sans IO approach so read_exactly()
is synchronous, independent of any async mechanisms (trio or otherwise). It takes a bit of async glue code to feed arbitrary bytes into a buffer and extract objects, but it seems minor compared to the client rewrites @alexshpilkin and I wish to avoid.
The Trio abstract interfaces like Stream are pretty general though,... you could re-use them
I hadn't considered using different implementations of the Stream
ABC as a way to re-use a parsing routine with different async drivers (Curio, asyncio, ..). My hunch is the particulars of a reactor and await
semantics could be different enough to make it difficult to write truly re-usable parsers.
What is it?
The
makefile()
method on Python sockets converts them to file-like objects, mappingwrite
tosend
, etc. Trio’s documentation says thatmakefile()
for Trio sockets is not implemented, becauseand then goes on to describe an asynchronous version of Python’s file-like API.
Where is it useful?
In an asynchronous world, (asynchronous versions of) APIs like
json.dump()
become much more useful and potentially uniform across file and network I/O. However, withoutmakefile()
, they would need to exist in awrite
version and asend
version, which would be silly.(I’m implementing such an API right now.)