alexshpilkin commented 5 years ago

What is it?

The makefile() method on Python sockets converts them to file-like objects, mapping write to send, etc. Trio’s documentation says that makefile() for Trio sockets is not implemented, because

Python’s file-like API is synchronous, so it can’t be implemented on top of an async socket,

and then goes on to describe an asynchronous version of Python’s file-like API.

Where is it useful?

In an asynchronous world, (asynchronous versions of) APIs like json.dump() become much more useful and potentially uniform across file and network I/O. However, without makefile(), they would need to exist in a write version and a send version, which would be silly.

(I’m implementing such an API right now.)

oremanj commented 5 years ago

This is hard to implement currently because Trio only has one kind of async file, and it delegates all operations to a thread, which isn't appropriate for a Trio socket (since the socket is using non-blocking mode). But once some more work happens on #174 and #219, I could see us winding up with multiple implementations of a single "async file interface", and I think it would make sense to support makefile() by using one of those.

A question though: should it be a method on sockets, or should it be a method (or a free function) on Streams? Streams seems more broadly useful to me, since we'd get SSL support and such "for free".

alexshpilkin commented 5 years ago

@oremanj In my case, I’d definitely want it to be an operation on sockets, because then trio and curio implementations of async things could share code... But then I don’t quite see the point of streams anyway, as in I don’t understand their advantages over files (which are, after all, the standard stream abstraction).

njsmith commented 5 years ago

So I can at least explain why Stream isn't the same as files. The idea is that SendStream and ReceiveStream are minimal interfaces for streaming data. OTOH the Python file interface has a bunch of methods that don't make sense for streaming data (seek, tell), or that are complex higher-level operations that you don't want to have to reimplement from scratch on every stream class (readline), or that are almost what you want but don't have quite the right semantics (read(n) tries to return exactly the requested number of bytes, but that's the wrong primitive for streams).

796 is also highly relevant here – it's about figuring out how to expose the higher-level operations like `readline` without having to reimplement them separately for each `Stream` class.

In an asynchronous world, (asynchronous versions of) APIs like json.dump() become much more useful and potentially uniform across file and network I/O. However, without makefile(), they would need to exist in a write version and a send version, which would be silly.

I think the "trio way" to think about this would be: conceptually json.dump should work with any object that you can incrementally send bytes into. Therefore, it should be written against the SendStream interface. And if you want to use that to write into a file, then you can make a SendStream implementation that writes into a file, and pass that to json.dump.

(To be clear: the "trio way" referred to here is something we're making up as we go, so this is tentative and subject to change if there are good counterarguments :-).)

I’d definitely want it to be an operation on sockets, because then trio and curio implementations of async things could share code...

Trio exposes the raw socket API for those who really need to access low-level details, but you should almost never use it... when you do, you're stuck dealing with all the weird OS-specific quirks. Using SocketStream lets Trio handle this stuff for you, plus lets your code work with other kinds of streams, like subprocess stdio and TLS encryption. It's unfortunate that Curio forces you to use the socket layer directly, but I don't think Trio should start wedging higher-level features into trio.socket to work around Curio's limitations... The Trio abstract interfaces like Stream are pretty general though, and not particularly tied to Trio. So I guess you could re-use them on Curio if you wanted?

alexshpilkin commented 5 years ago

@njsmith Huh. I think you sold me on streams. Yes, the API is unwieldy, and yes, I’ve been walking around thinking read() is silly for quite some time (although my gripe is more with the unbuffered version). (I think that you need a read_at_least thing as well, though, even if as a library function, because when you’re writing a parser, you really, really want it, which is why unbuffered read() sucks.)

But in that worldview, makefile() basically has no place in native Trio programs—those should use streams. Which means it remains largely as a compatibility feature, which means, again, that it should be a method on vanilla-Python–compatible sockets. And rewriting the client two times (sync and async) or three (sync, asyncio and Curio, Trio) is still a big deal, even if you rightly feel reluctant about baking in compatibility features into the library. So I rest my case regarding the place for makefile().

And I think it’s unfair to characterize the current Trio interface as a complete substitute for sockets; I have to do UDP sometimes, and broadcasts, and sometimes both, and sometimes I even want to connect() and makefile() a UDP socket to pass it to some code that handles TCP as well. I literally want to do each of these in the thing I’m writing now, if only in a small number of places.

Don’t really see how I could reuse Trio streams under Curio, though, aside from reimplementing them.

njsmith commented 5 years ago

(I think that you need a read_at_least thing as well, though, even if as a library function, because when you’re writing a parser, you really, really want it, which is why unbuffered read() sucks.)

Yeah, this is part of #796 too. No one thinks Trio is complete and finished yet :-). But the thing that needs to be added is some kind of helper function or wrapper class, not a part of the Stream in interface itself.

Which means it remains largely as a compatibility feature, which means, again, that it should be a method on vanilla-Python–compatible sockets. And rewriting the client two times (sync and async) or three (sync, asyncio and Curio, Trio) is still a big deal, even if you rightly feel reluctant about baking in compatibility features into the library.

Asyncio doesn't have makefile either. And sync makefile isn't compatible with async; the whole API is different. So I'm sympathetic to the compatibility issues; it's just not an easy problem. You certainly could implement a makefile-like API for trio and asyncio and then use it as a compatibility layer. You could also implement a Stream-like API for asyncio and curio. Both take some work; up to you what you think makes more sense for your situation, I guess.

Note that a simple (not necessarily the most efficient) implementation of read_exactly is just

def read_exactly(stream, count):
    buf = bytearray()
    while len(buf) < count:
        chunk = await stream.receive_some(count - len(buf))
        if not chunk:
            break
        buf += chunk
    return buf

And since it only uses method calls on the stream, you can reuse that function for Streams implemented using any library.

And I think it’s unfair to characterize the current Trio interface as a complete substitute for sockets; I have to do UDP sometimes, and broadcasts, and sometimes both,

Sure, that's fair. I was just talking about SOCK_STREAM sockets. Obviously the Stream interface doesn't handle UDP, which isn't a Stream :-). (And even for TCP we do allow you to drop down to the raw socket layer briefly in case you need to do a setsockopt or something, while staying at the Stream layer the rest of the time.)

sometimes I even want to connect() and makefile() a UDP socket to pass it to some code that handles TCP as well.

That's interesting – I don't know much about this! Can you say more? Don't you run into issues with the UDP-oblivious code messing up packet boundaries? Do any of the read* methods work?

alexshpilkin commented 5 years ago

sometimes I even want to connect() and makefile() a UDP socket to pass it to some code that handles TCP as well.

That's interesting – I don't know much about this! Can you say more? Don't you run into issues with the UDP-oblivious code messing up packet boundaries? Do any of the read* methods work?

There’s nothing much in there. You can connect() a UDP socket, which will do nothing on the wire, but will limit it to the specified peer address and port pair for send() and recv(). Then you can use send()/recv() or even read()/write() on it and it’ll do the right thing... Except it will still remain datagram-oriented, so a single datagram must correspond to a single call (and read() calls will discard data if you’re not careful). It’s just that my sending code takes care to send each application-level message in a single write() call.

So yes, you do run into issues with UDP-oblivious code, but if you control the code, you can avoid them. A smarter BufferedReader/BufferedWriter implementation could even do some shielding if it wanted to (always recv() into a datagram-sized buffer, only send a datagram on a flush(), etc.).

JefffHofffman commented 5 years ago

796 is also highly relevant here

I'm new to open source. Shall I submit a pull request of parsing utils as outlined in that issue in order to continue that discussion? It's more a sans IO approach so read_exactly() is synchronous, independent of any async mechanisms (trio or otherwise). It takes a bit of async glue code to feed arbitrary bytes into a buffer and extract objects, but it seems minor compared to the client rewrites @alexshpilkin and I wish to avoid.

The Trio abstract interfaces like Stream are pretty general though,... you could re-use them

I hadn't considered using different implementations of the Stream ABC as a way to re-use a parsing routine with different async drivers (Curio, asyncio, ..). My hunch is the particulars of a reactor and await semantics could be different enough to make it difficult to write truly re-usable parsers.

python-trio / trio

Trio sockets should have a makefile() that returns a Trio async file object #1024

What is it?

Where is it useful?

796 is also highly relevant here – it's about figuring out how to expose the higher-level operations like `readline` without having to reimplement them separately for each `Stream` class.

796 is also highly relevant here

python-trio / trio

Trio sockets should have a makefile() that returns a Trio async file object #1024

What is it?

Where is it useful?

796 is also highly relevant here – it's about figuring out how to expose the higher-level operations like readline without having to reimplement them separately for each Stream class.

796 is also highly relevant here

796 is also highly relevant here – it's about figuring out how to expose the higher-level operations like `readline` without having to reimplement them separately for each `Stream` class.