python-trio / trio

Trio – a friendly Python library for async concurrency and I/O

https://trio.readthedocs.io

Other

6.17k stars 338 forks source link

Add support for talking to our stdin/stdout/stderr as streams #174

Open njsmith opened 7 years ago

njsmith commented 7 years ago

There should be a convenient and standard way to read and write from the trio process's stdin/stdout/stderr streams. (Note that this is different from talking to the stdin/stdout/stderr of child processes, which is part of #4.) Probably this should use our standard stream abstraction.

Complications to consider:

Normally Python's I/O stack does a bunch of work here: text/binary conversion, newline conversion, buffering, convenience parsing things like readline and line iteration. (I think that's it - anything else?) We have to decide whether we want to re-use it (basically doing run_in_worker_thread for everything) or reimplement the parts we want. {Send,Receive,}TextStream classes seems like a reasonable thing to provide in general, and they should probably implement universal newline support too, why not. (Not sure we even need ABCs for these - they could just be concrete classes? though I suppose someone might eventually come up with a situation where they have an object that natively acts like this without any underlying binary stream, and want to explicitly declare that the interface is the same.) Buffering I'm somewhat dubious of – when applied to stdin/stdout/stderr it often causes user-visible problems (delayed output), it's redundant with buffering done by the kernel (as usual), and we try to minimize it in general. It's particularly bad if you want to speak some automated interactive protocol over stdin/stdout, which seems like a case that might come up in trio relatively often. And convenience parsing (readline etc.) might be better handled using sans-IO style protocol objects?

It might even make sense to do both; #20 might mean that we have a 3 line solution for the "wrap an io.TextIOWrapper object" approach if that's what you want, and then also provide a lower-level more-direct stream-based API.

On Windows, the only reliable way to do non-blocking I/O to the standard streams is via threads. In particular, it's the only thing that works if we're connected to a regular console. Everywhere else, non-blocking I/O is possible (and the sensible thing if we do decide to cut out Python's io stack). Edit: See update below.

On Windows, you often have to do these separate console control calls for things like cursor movement and coloring text, which need to be synchronized with the output stream. (In the very very latest Win 10 update they finally added VT100 support to the console, but it will be a while before anyone can count on that.) I believe that the output is still binary (UTF-16) rather than using some kind of first-class text read/write API.

I know prompt_toolkit has an async API and they support a lot of fancy terminal stuff in pure Python - we should check what they need to make sure whatever we come up with matches.

njsmith commented 7 years ago

io.IncrementalNewlineDecoder might be useful if we need to implement our own universal newline support. It's not documented, unfortunately.

buhman commented 7 years ago

Related to #4, in previous projects I've played with feeding ptys to subprocesses instead of pipes (not sure about the correctness of the below):

import asyncio
from asyncio.base_subprocess import ReadSubprocessPipeProto
import os
import pty

async def subprocess_exec_pty(protocol_factory, *args, **kwargs):
    loop = asyncio.get_event_loop()

    stdout_master, stdout_slave = pty.openpty()
    stderr_master, stderr_slave = pty.openpty()

    transport, protocol = await loop.subprocess_exec(
        protocol_factory, *args,
        stdout=stdout_slave, stderr=stderr_slave, **kwargs)

    _, pipe = await loop.connect_read_pipe(
        lambda: ReadSubprocessPipeProto(transport, 1),
        os.fdopen(stdout_master, 'rb', 0))
    transport._pipes[1] = pipe

    _, pipe = await loop.connect_read_pipe(
        lambda: ReadSubprocessPipeProto(transport, 2),
        os.fdopen(stderr_master, 'rb', 0))
    transport._pipes[2] = pipe

    return transport, protocol

separate console control calls for things like cursor movement

Unless we're reimplementing prompt_toolkit, is this required to provide a valid {Send,Receive,}TextStream ? The inverse of above, what if we just presume our stdin/stdout is always not a console?

I think it would be convenient if the API mirrored the stdlib a little: trio.stdout or something might even be ok.

Is the implementation here specifically that we set sys.{stdin,stdout,stderr}.fileno to nonblocking, and provide our own *Stream classes that provide specialized asynchronous file object interfaces? Is it considered invalid user behavior to use print() afterwards? Shouldn't we provide some await print() too? What happen when someone wants to use a stdout logging handler?

buhman commented 7 years ago

https://github.com/twisted/twisted/blob/twisted-17.5.0/src/twisted/internet/stdio.py

twisted seems to have no trouble setting nonblocking in t.i.process for stdout,stderr,stdin on non-windows
https://github.com/twisted/twisted/blob/twisted-17.5.0/src/twisted/internet/process.py#L159 is interesting
- caused by: https://twistedmatrix.com/trac/ticket/2259
- we should definitely have a test for that

Yet:

https://twistedmatrix.com/trac/ticket/3442

asyncio doesn't support this directly yet (nice):

https://github.com/python/asyncio/issues/213

njsmith commented 7 years ago

Oh wow yeah this is way nastier than I had realized.

So the absolute simplest solution would be to suggest people use wrap_file(sys.stdin) etc. (Or some equivalent convenience API.) That's effectively the only thing that works on Windows, and it's the only thing that works on unixes when an fd has been redirected to a file, and it's by far the simplest thing that avoids the nasty issue with different processes fighting over non-blockingness state. For those who want to speak some protocol over stdin/stdout we can make simple implementations of ReceiveStream/SendStream that delegate to a wrapped unbuffered binary file object. That all would work. And it works today, which is nice.

It has the downside that it's probably pretty slow compared to doing real non-blocking io in the cases where that's possible. So there's a specific use case we're talking about where this might be inadequate, the one where you're specifically trying to push bulk data through the standard descriptors, probably talking between two programs. So one question is whether and how we can do better for this case. Can we detect when the fd supports non-blocking operation? (Apparently from the twisted discussion it sounds like epoll will refuse to work, so that's one indication if nothing else. Not sure if kqueue works the same way. I guess just setting and then checking the nonblocking flag might work.) If we can detect that, then we can potentially offer two modes: the "always works" mode, and the "always works as long as no one else minds us setting things to non-blocking", and people who need speed and don't mind taking a risk can use the latter.

I don't know how important this feature is in practice. It might not be worth the complexity.

njsmith commented 7 years ago

Oh, here's another fun issue to keep in mind: TextIOWrapper objects are not thread safe. This means that if we naively wrap_file(sys.std...), then the resulting object is unsafe to call from multiple tasks simultaneously. Which is worrisome because it's a global object. Perhaps some locking is in order.

The globalness of the standard descriptors causes several problems, actually. If we set them non-blocking, then it's not just other processes that get messed up, it's also naive calls to print or similar within our process. Obviously print is not a great thing to be calling all over the place in a robust app that wants to make sure it never blocks, but for things like debugging or test output it's pretty useful. pdb.set_trace is another example of why we might want to keep stdin/stdout working in blocking mode.

... And actually this is also trickier than it might seem, because the thread safety issue also applies between the main thread and worker threads, i.e. even if trio.stdout has some locking inside it so that all calls that go through it serialized, then they can still race with direct accesses to sys.stdout. It's possible that we could avoid this by using two different TextIOWrapper objects pointed at the same underlying BufferedIO object, which creates different possibilities for corrupt output when both are used at the same time, but at least internal data structures would survive.

Anyway, one thing this makes clear is that the decision to use the standard fds for programmatic purposes is really not something to take lightly – if you're going to do it then the whole program needs to agree on how.

Oh, I just remembered another fun thing about stdin: trying to read from it can cause your whole program to get suspended (SIGTSTP).

njsmith commented 7 years ago

Whoops, I don't mean SIGTSTP, I mean SIGTTIN and SIGTTOU. And apparently both writing and reading can trigger suspension.

buhman commented 7 years ago

unsafe to call from multiple tasks simultaneously

What about task-local storage?

but for things like debugging or test output it's pretty useful

That's why I was saying an 'await print()' helper would be useful.

njsmith commented 7 years ago

Hmm, here's another trick, but it might not be widely applicable enough to be worthwhile: the send and recv syscalls accept a flags argument, and one of the flags you can pass is MSG_DONTWAIT, which makes a socket effectively non-blocking just for this call, without affecting any global state.

But... AFAICT this is supported only on Linux, not Windows or MacOS. On MacOS, the MSG_DONTWAIT constant appears to be defined, but it's not mentioned in the send man page, and it doesn't seem to work:

# MacOS
In [1]: import socket

In [2]: socket.MSG_DONTWAIT
Out[2]: 128

In [3]: a, b = socket.socketpair()

In [4]: while True:
   ...:     print("sending")
   ...:     res = a.send(b"x" * 2 ** 16, socket.MSG_DONTWAIT)
   ...:     print("sent", res)
   ...:     
sending
[...freezes...]

And on Windows it doesn't appear to be either documented or defined.

And, even on Linux, it only works on sockets. If I try using nasty tricks to call send on a pty, I get:

# Linux
In [3]: s = socket.fromfd(1, socket.AF_INET, socket.SOCK_STREAM)

In [4]: s.send(b"x")
OSError: [Errno 88] Socket operation on non-socket

and similarly on a pipe:

# Linux
In [5]: p1, p2 = os.pipe()

In [6]: s = socket.fromfd(p2, socket.AF_INET, socket.SOCK_STREAM)

In [7]: s.send(b"x")
OSError: [Errno 88] Socket operation on non-socket

This has me wondering though if there's any other way to get a similar effect. There was a Linux patch submitted in 2007 to make Linux native AIO work on pipes and sockets; I don't know if it was merged, but in principle it might be usable to accomplish a similar effect.

On pipes, if no-one else is reading from the pipe, then the FIONREAD ioctl can be used to find out how many bytes are ready to be read, so reading that much won't block. Of course, someone else might be reading from the pipe at the same time, steal them out from under you, and then you get blocked for an arbitrary amount of time, whoops. And for writing, there doesn't seem to be any similar trick (you can use F_GETPIPE_SZ to find out how big the pipe buffer is, but not how full it is; possibly there's some undocumented IOCTL somewhere that I'm missing). So maybe this is useless.

Maybe we should focus on making threaded I/O as fast as possible :-)

Unrelated issue: there's also some question about how a hypothetical trio.stdout should respond if someone replaces sys.stdout. This is a fairly common and supported thing. If we just do trio.stdout = trio.wrap_file(sys.stdout), and then someone does sys.stdout = ..., then trio.stdout will keep pointing to the old stdout. OTOH, if we make trio.stdout a special object that always looks up sys.stdout on every call, then... it won't work, because of locking issues. Le sigh.

njsmith commented 7 years ago

What about task-local storage?

Task-local storage would be useful if there were some way to give each task its own private stdin, stdout, etc., but.... I'm not sure what that would mean? :-) Those are kind of inherently process-global resources.

That's why I was saying an 'await print()' helper would be useful.

await trio.print might well be useful (isn't it lucky that in Python print is a regular function, not a piece of special syntax?), but it doesn't help for sticking a quick debug print in a sync function, or for the pdb.set_trace() case.

njsmith commented 7 years ago

Update: Apparently I was wrong! On Windows, It is possible to read/write to the console without doing blocking read/write in threads. Which is good, because ReadFile and WriteFile on the console can't be cancelled, and we'd really like to be able to cancel these operations (e.g. because the user hits control-C).

This stackoverflow question seems to have reasonable info (once you filter through all the partial answers). AFAICT, the basic idea is that you call GetStdHandle to get a HANDLE pointing to the console, which can be passed to one of the WaitFor functions, and once that returns then ReadConsoleInput can be called to pull out "events", which might be keypresses or might be other things like mouse movements. We need to support WaitFor anyway (#233), so this is all pretty reasonable. And for output, I guess you just use WriteConsole and friends (SetConsoleTextAttribute etc.), and since these can only be used to write to the console, they might be slow (and you might want to push them off into a worker thread), but they shouldn't block indefinitely.

Now, all the APIs mentioned in the previous paragraph assume that your program is attached to a regular console (like a TTY on unix). And you can always get access to whatever console you're running under (if any) by opening CONIN$ or CONOUT$, sort of like opening /dev/tty on Unix, which might be useful sometimes. But for most purposes, we want to also do something sensible when stdin/stdout/stderr are redirected, and in this case all of the above APIs will just error out, and we need to fall back on some other strategy. There are five cases that I know of:

standard stream is a console object
standard stream is a socket with OVERLAPPED support disabled (if a socket has OVERLAPPED support enabled then it can't be used as a standard stream, because... Windows)
standard stream is a named pipe (I think OVERLAPPED might or might not be enabled)
standard stream is an anonymous pipe
standard stream is an actual on-disk file

The first case (magic console objects) is described above.

Socket without OVERLAPPED support: well, we can use select and non-blocking I/O, though this might be tricky if we end up switching trio.socket to using IOCP (#52). I guess blocking I/O in a thread + CancelSynchronousIo might work? It might be possible to enable OVERLAPPED I/O via ReOpenFile? (It also has poorly-documented limitations.)

Named pipe: can't assume OVERLAPPED is available; maybe ReOpenFile works, maybe not. Anonymous pipe: these are basically named pipes, except with a bonus limitation: "Asynchronous (overlapped) read and write operations are not supported by anonymous pipes" (ref). So we'd need some strategy that doesn't use IOCP, I think.

On-disk files: well, here just plain old threads are OK, because reading/writing to a file might be slow but it shouldn't block indefinitely.

So tentatively I'm thinking:

We probably want some lowish-level Windows-specific API for talking to the console, that wraps ReadConsoleInput and WriteConsole+friends in a thin layer of trio compatibility code.
We should experiment with CancelSynchronousIo to figure out whether it can be used together with blocking ReadFile/WriteFile to handle the other cases here. [Edit: alternatively, we should experiment with WaitForSingleObject to see if it can be used to tell whether a subsequent call to ReadFile/WriteFile will block, similar to what python-prompt-toolkit does on Unix (see comment below). Though this discussion isn't promising....]
If both of these work out, then we can provide a layer on top that figures out which kind of console stream we have, and then uses the appropriate lower-level API to expose a standard Stream-based interface.

njsmith commented 7 years ago

Also, note for reference: looking at the python-prompt-toolkit code, it appears that the way they do async interactive applications on Unix is to select to see if a standard stream is readable/writable, and then issue a blocking read/write, i.e. they leave the streams in blocking mode and then cross their fingers that this won't bite them. And I guess they get away with it, because I don't see any bug reports related to this...

njsmith commented 7 years ago

Further Windows update: while I still can't find any references to CancelSynchronousIo working on console reads through web search, @eryksun claims in this message that it does with some caveats. (Eryk, if you happen to have any thoughts on this thread in general that'd be very welcome... The topic is, how can one reliably read/write to stdin/stdout without blocking the main thread, and so that all operations that might block indefinitely are cancelable.)

Another note: GetFileType may also be useful here.

eryksun commented 7 years ago

Unfortunately canceling a console read via CancelSynchronousIo doesn't work prior to Windows 8. I haven't seriously used Windows 7 in a long time, so I forget about its limitations until I go out of my way to test on it. I should have known better. The console has only had a real device driver since Windows 8. Older versions use an LPC port to communicate between a client and the console. In this case console buffer handles are allocated by the console itself. These pseudohandles are flagged with the lower 2 bits set (e.g. 3, 7, 11), so regular I/O functions know to redirect to console functions (e.g. CloseHandle -> CloseConsoleHandle). Without an actual I/O request, there's nothing for CancelSynchronousIo to cancel.

njsmith commented 7 years ago

libuv has a clever trick! If you want to set stdin/stdout/stderr non-blocking, and it's a tty, then you can use os.ttyname to get the device node for this tty and open a second copy of it. And this is by far the main case where we might have other programs confused by sharing stdin/stdout/stderr. (Read the patch and probably check the current code, there are a number of subtleties.)

That blog post also mentions that kqueue on MacOS doesn't work on ttys, which would be super annoying, but apparently this got fixed in 10.7 (Lion). I don't think we need to care about supporting anything older than 10.7. Apparently even 10.9 is already out of security-bugfix-land. (ref)

njsmith commented 7 years ago

@remleduff has made a remarkable discovery: on Linux, libuv's clever trick of re-opening the file can actually be done on anonymous pipes too, by opening /proc/self/fd/NUM. I guess this makes some sense if you recall that FIFOs can be opened multiple times for read and/or write, and anonymous pipes and FIFOs are the same thing, but I was still shocked. (On MacOS there's /dev/fd/NUM, but opening this unfortunately seems to just dup the existing fd rather than actually re-opening it.)

So this means that technically on Linux I think we actually can handle every common case:

tty or pipes: re-open
files on disk: use threads, like any other on disk file
character devices like /dev/zero or /dev/urandom: re-open probably works? Are character devices ever stateful in such a way that re-opening them gives an fd that acts differently from the original? For example, do character devices ever support seeking? It looks like they do on FreeBSD. But it's easy to test if a fd is seekable with os.lseek(fd, 0, os.SEEK_CUR). [Edit: well, modulo files that claim to be seekable but are lying, if those really exist.] Are there any non-seekable files that nonetheless maintain per-open state?
sockets: re-open fails, but MSG_DONTWAIT works

The first three cases cover the vast vast vast majority of stdin/stdout/stderr configurations that actually occur in practice. I'm not sure sockets are common enough to justify a whole extra set of code paths, but maybe.

njsmith commented 7 years ago

I also spent some time trying to figure out if there was a way to making blocking I/O cancellable. ~~I don't think there is.~~ maybe there is?

The first idea I considered is: start a thread that will sit blocked in read or write, and then if we want to cancel it, use pthread_kill to send a signal to trigger an EINTR. The problem is that this is inherently racy for the same reason that pthread cancellation requires special kernel help – you might have the signal arrive just before entering the syscall, or it might arrive just after successfully exiting, and you want to treat these differently (in the first case return failure, in the second case return successfully), but there's absolutely no way to tell the difference between them except by examining the instruction pointer, which requires you write your own asm for doing syscalls. So that's out.

The second idea I considered is: dup the fd, issue a read or write on the dup, and then if you want to cancel the read or write early, close the dup. (We can't close the original, because we still need it, but we can close the dup.) Unfortunately, on Linux at least this doesn't work: read on a pipe doesn't actually return until the write side of the pipe has been fully closed (i.e., there are no remaining fds pointing to it). If the fd it's actually using disappears out from under it then oh well, it doesn't care. ...And even if this worked, there'd still be a race condition if we closed the fd just before entering read, because it could be re-opened by another thread in the mean time. I guess we could fix the race condition by dup2ing a bad fd on top of the original fd, but that still doesn't help with the part where you can't wake it up.

OH WAIT THOUGH. What if we combine these. Option 3: dup the fd. Dispatch a blocking read or write to a thread using the dup. On cancellation, use dup2 to atomically overwrite the original fd with one for which we know read/write will fail (e.g. an unconnected socket). Then use pthread_kill to send a no-op signal to the worker thread.

If the dup2 happens before we enter read/write, then they'll fail immediately, like we want.

Otherwise, it means the dup2 happens after we enter read/write, which implies that the signal does as well. So one possibility is that the signal arrives while we're still in read/write. In this case it returns early with EINTR, CPython attempts to re-issue the call, and then the new calls fails with EBADF because this is after the dup2. Alternatively, the signal arrives after the read/write have completed, in which case it does nothing, which is again what we want.

This still has the problems that we have to claim a signal, and if we're running outside the main thread then Python doesn't provide an API for registering a signal handler (and I'm pretty sure that to get EINTR we need to have a C-level signal handler registered, even though we want it to just be a no-op). But we could potentially grab, like, SIGURG which hopefully no-one actually uses and is ignored by default, and use ctypes to call sigaction.

This is kind of a terrible idea, but I do think it would work reliably and portably on all Unixes for all fd types.

njsmith commented 7 years ago

DJB has some commentary on how properly written kernels should do things, which is completely correct and yet useless in practice, alas: https://cr.yp.to/unix/nonblock.html

njsmith commented 6 years ago

I guess this is some kind of argument for... something: https://gist.github.com/njsmith/235d0355f0e3d647beb858765c5b63b3

(It exploits the fact that setuid(getuid()) is a no-op except that limitations of the Linux syscall interface mean that libc setuid wrappers in multi-threaded programs have to seize control of all the other threads, which they do by sending them a signal, so this forces all other threads to restart whatever syscalls they were doing.)

njsmith commented 6 years ago

Here's the discussion about this in mio: https://github.com/carllerche/mio/issues/321

It looks like libuv has an amazing thing where their tty layer on windows actually implements a vt100 emulator in-process on top of the windows console APIs: https://github.com/libuv/libuv/blob/master/src/win/tty.c

I looked at SIGTTIN/SIGTTOU again. This is a useful article. It sounds like for SIGTTIN, you can detect when you've been blocked from reading (ignore SIGTTIN, and then read gives EIO), but when writing you just get a signal + EINTR, which is pretty awkward given that Python signal delivery happens after some delay and that the os.write handler unconditionally retries on EINTR. Also, in both cases, AFAICT there's no way to get a notification when you can retry; you just have to poll. Maybe we should just ignore this issue and document it as a limitation – sometimes your process will get put to sleep, deal with it. (I doubt it's an issue for most of the cases where people want to speak protocols on stdin/stdout.)

remleduff commented 6 years ago

You've probably seen this already, but Windows 10 has been making large changes (improvements) to console handling. Is it better to have a wait-and-see attitude on this one, and just try to make it work really well starting with Windows 10?

https://blogs.msdn.microsoft.com/commandline/2018/06/20/windows-command-line-backgrounder/

njsmith commented 6 years ago

The console changes are great, but unfortunately, as far as I know none of them change the basic api that apps use to talk to their stdin/stdout when it's a console.

That API did get some work in win 8 – in particular some possibly useful cancellation support – but there's still no real async API afaik.

njsmith commented 5 years ago

More discoveries:

The discussion above concluded that on Windows 7, if you want to cancel a synchronous console read, then CancelSynchronousIo doesn't work. However, on Win 7, there is some hint here that you might be able to use CloseHandle to interrupt a ReadConsole: https://github.com/libuv/libuv/issues/852
The Python colorama package actually does something similar to libuv's vt100 emulation: it translates (a useful subset of) vt100 into Windows console commands. (Of course on recent-enough Windows you can just enable the native vt100 mode, but having both options available is nice!) It looks like we could pretty much just... use it for output.
The Windows console is natively Unicode, not bytes, which creates some conundrums. This means that on Windows stdout is sometimes a byte-stream and sometimes it's a text-stream, but we'd like open_stdout to return a consistent type... Fortunately PEP 528 standardized how to cope with this in Python: we pretend the console is a utf8 bytestream. PEP 528 itself is only implemented in py36. There's also a pure-python backport library. Or we could just do our own codec handling.

General strategy

I'm thinking we'll want:

A low-ish level "tty API". For example, you can use this open your controlling tty, even if it's not stdin/stdout/stderr. It's a Stream, with maybe some extra tty-specific features (window size query? notification when window size changes? toggling cooked/raw mode?). Probably trio.open_tty() to get the controlling tty, plus on Unix an option to wrap an fd. (On Windows the controlling tty is the only interesting tty.)
- On Unix, this is probably just the same code as we use for pipes, possibly with whatever snazzy extras we come up with. open_tty re-opens /dev/tty to get its own independent fd, so we can toggle non-blocking without breaking everyone.
- On Windows, this is a complex fancy object that uses the low-level console-specific APIs, utf8 codec, colorama, etc.
A low-ish level "random pipe API".
- On Unix, this is #829. This API makes the fd non-blocking, and it's your job to make sure that that's OK. It does set it back again when it's done though.
- On Windows, this probably involves choosing between 3 different implementations: our current named pipe support (for any handle that supports IOCP) (see also #824), SocketStream (for socket handles), and some hacky thing that uses threads + CancelSynchronousIo (for everything else).
A stdin/stdout/stderr API: probably open_stdin() -> ReceiveStream, open_stdout() -> SendStream, open_stderr() -> SendStream, open_stdio() -> StapledStream (or maybe -> Union[StapledStream, TTYStream])
- These do the necessary autodetection to figure out what kinds of streams we're dealing with, and then use the APIs above to actually produce it.
- On Unix, they jump through the necessary hoops to re-open ttys, pipes, etc. to minimize how often we end up putting stuff into non-blocking mode. But in the cases where this is hard to avoid, then we go ahead and put them in non-blocking mode.
- To the maximum extent possible, we should make these streams independent of sys.stdin and friends. If people want to use pdb, or print debugging, that's cool, we should try to make it just work. I think we should mostly be able to manage this by creating separate fds/handles and leaving the default ones alone, but if that turns out to be impossible, we could also consider replacing sys.stdin and friends with wrappers that do something sensible (temporarily switch back into blocking mode when accessed? error out?)
We should provide a trio.input, that's like builtins.input but async and routed through our stdio-handling machinery. Probably it just calls receive_some(1) a bunch of times until it sees a newline.
- On Unix, builtins.input uses readline. That is probably too much to hope for... I mean, I guess there's python-prompt-toolkit but trio should probably not depend on python-prompt-toolkit. We can at least think about if there's any way to get line-editing, but it's absolutely not required for the first version.
  - Alternatively, it looks like readline actually has a relatively simple async API?

njsmith commented 5 years ago

TODO: check how usable the above would be for python-prompt-toolkit / urwid

oremanj commented 5 years ago

Interesting email about how Linux manages file reference counting, vaguely relevant to parts of this discussion: https://lore.kernel.org/linux-block/20190129192702.3605-1-axboe@kernel.dk/T/#m72b2f6d99dfb9e699ffcbe899d02b293afaa9608

njsmith commented 5 years ago

Allegedly Win7 is EOL on January 14, 2020. (I guess all things named "7" go EOL at the same time?) So maybe we don't need to care about it very much? This is important because Win8 is where it became possible to cancel console reads. OTOH as of right now it apparently still has like 30% market share.

It's not clear whether cancelling console reads is that exciting anyway; if you're using ReadConsoleInput to get raw characters then you don't really need it to be cancellable, because you can do all the blocking parts with WaitForSingleObject (which is cancellable). And if you're on Windows 8 or earlier, I think this is the only way to implement line editing (e.g. I think you can't get arrow keys through ReadConsole, only ReadConsoleInput?).

BUT... the downside is that if you use ReadConsoleInput, then you'd better be prepared to implement a full line editor, because it basically just gives you raw keypresses – there's no "cooked mode" equivalent where you get basic line editing and line buffering for free. If you want that, you have to use ReadConsole and friends. And on Win10 you can also enable VT100 handling through ReadConsole if you want to do fancy stuff like write your own line-editor, and you can use CancelIoEx or CancelSynchronouseIo for cancellation, and everything is fine. And on Win8 you can at least use ReadConsole in cooked mode, even if it doesn't have VT100 support. But on Win7 this doesn't work at all, because there's no way to escape from a call to ReadConsole (except maybe?? by closing the console handle).

I guess one option is to start by targeting Win8/Win10, and then decide whether it's worth implementing some home-grown basic line-editor/VT100-emulation/etc. for Win7.

eryksun commented 5 years ago

And on Win10 you can also enable VT100 handling through ReadConsole if you want to do fancy stuff like write your own line-editor, and you can use CancelIoEx or CancelSynchronouseIo for cancellation, and everything is fine.

Cancelling a console read is clunky, unfortunately. The console host (conhost.exe) doesn't cancel the cooked read when the request is cancelled. At best, the line gets discarded when the user presses enter. At worst, the console crashes, as I've just discovered while testing this again.

if you're using ReadConsoleInput to get raw characters then you don't really need it to be cancellable, because you can do all the blocking parts with WaitForSingleObject (which is cancellable).

ReadConsoleInputW won't block if there's at least one input event in the buffer. We can wait on the console input handle for the arrival of this input event. If it's alertable, the wait can be interrupted by queueing a user-mode APC. Or we can use a multiple-object wait that includes a kernel event object.

e.g. I think you can't get arrow keys through ReadConsole, only ReadConsoleInput?

Right. With ReadConsoleW, we can disable the 'cooked' aspects (line input, echo input, processed input), but we still can't read key presses for various keys such as modifiers, function keys, arrows, escape, and home -- not unless virtual-terminal input is enabled in Windows 10.

FYI, the pyreadline package implements readline for the Windows console using low-level ReadConsoleInput.

njsmith commented 5 years ago

Cancelling a console read is clunky, unfortunately. The console host (conhost.exe) doesn't cancel the cooked read when the request is cancelled. At best, the line gets discarded when the user presses enter. At worst, the console crashes, as I've just discovered while testing this again.

Huh, I believe you but it makes me wonder why Raymond Chen seems to think it works fine :-): https://devblogs.microsoft.com/oldnewthing/?p=44413

ReadConsoleInputW won't block if there's at least one input event in the buffer. We can wait on the console input handle for the arrival of this input event. If it's alertable, the wait can be interrupted by queueing a user-mode APC. Or we can use a multiple-object wait that includes a kernel event object.

Right, and Trio already has a convenient WaitForSingleObject abstraction (the implementation currently uses WaitForMultipleObjects and passes in a kernel event object, like you say). But then you have to interpret the input you get, and it would be really nice if we could present an API to users that was just like "write utf8+vt100", "read utf8+vt100", "check for window size change", "toggle cooked mode", like everyone expects on Unix, and for compatibility with cases where you want to speak some byte-oriented protocol over stdin/stdout but run it on the console for testing. Adapting ReadingConsoleInputW into that API seems quite annoying, but maybe we have no choice if we want anything to work on Win <10.

njsmith commented 5 years ago

...Oh wait, and you're actually saying that if you want cooked mode, and cancellation, then even on Win 10 you're doomed to implement your own cooked mode from scratch.

njsmith commented 5 years ago

On Win 10, does ReadConsoleW at least handle cancellation well if you're in raw mode? (Ideally with vt100 support turned on?)

njsmith commented 5 years ago

Huh, libuv has an interesting strategy for cancelling a console read in cooked mode: it pushes a carriage return into the input buffer, and then immediately rewrites the console output to hide that fact!

https://github.com/libuv/libuv/blob/ee24ce900e5714c950b248da2bdd311b01c983be/src/win/tty.c#L1040-L1104 https://github.com/libuv/libuv/blob/ee24ce900e5714c950b248da2bdd311b01c983be/src/win/tty.c#L522-L548

[Edit: here's the PR: https://github.com/libuv/libuv/pull/866]

[Edit 2: great bit in the PR log: "have you considered doing something else?" "Unfortunately, there isn't another way. I have brought this up to the team in Windows who works on the console APIs."]

eryksun commented 5 years ago

Cancelling a console read is clunky, unfortunately. The console host (conhost.exe) doesn't cancel the cooked read when the request is cancelled. At best, the line gets discarded when the user presses enter. At worst, the console crashes, as I've just discovered while testing this again.

Huh, I believe you but it makes me wonder why Raymond Chen seems to think it works fine :-): https://devblogs.microsoft.com/oldnewthing/?p=44413

Raymond's toy program is exiting stage left instead of sticking around to live with the painful consequences. Below I've modified his program to add a loop:

#include <stdio.h>
#include <windows.h>

DWORD CALLBACK
cancelProc(void *p)
{
    Sleep(4000);
    CancelIoEx(GetStdHandle(STD_INPUT_HANDLE), NULL);
    return 0;
}

int
wmain(int argc, wchar_t **argv)
{
    while (1) {
        char buffer[80], *result;
        HANDLE hThread = CreateThread(NULL, 0, cancelProc, NULL, 0, NULL);
        if (!hThread) {
            DWORD lastError = GetLastError();
            fprintf(stderr, "Error creating cancel thread: %d\n", lastError);
            return lastError;
        }
        printf("Type something: ");
        result = fgets(buffer, sizeof(buffer), stdin);
        TerminateThread(hThread, 0);
        CloseHandle(hThread);
        if (result != NULL) {
            printf("TYPED: %s", result);
        } else if (ferror(stdin)) {
            DWORD lastError = _doserrno;
            if (lastError && lastError != ERROR_OPERATION_ABORTED) {
                fprintf(stderr, "Error reading stdin: %d\n", lastError);
                return lastError;
            }
            clearerr(stdin);
            printf("\nTIMEOUT\n");
        } else if (feof(stdin)) {
            break;
        }
    }
    return 0;
}

Here's an example run that shows how the 'canceled' reads get queued up in the console:

Type something:
TIMEOUT
Type something:
TIMEOUT
Type something: 1
2
3
TYPED: 3
Type something:
TIMEOUT
Type something:
TIMEOUT
Type something: ^Z
^Z
^Z

eryksun commented 5 years ago

[Edit 2: great bit in the PR log: "have you considered doing something else?" "Unfortunately, there isn't another way. I have brought this up to the team in Windows who works on the console APIs."]

piscisaureus may be right that there could be a slightly better way than writing enter to the buffer. The pInputControl parameter of ReadConsoleW can set a bitmask (dwCtrlWakeupMask) of one or more ASCII control codes to immediately return from a cooked read, with the control character left in place in the buffer. cmd.exe implements tab (^I) filename completion using this feature combined with screen rewriting. I haven't tried it with WriteConsoleInput, however.

njsmith commented 5 years ago

As usual, the only useful documentation on dwCtrlWakeupMask is rumors and innuendo:

https://stackoverflow.com/questions/43836040/win-api-readconsole https://stackoverflow.com/questions/43863509/how-to-send-eof-from-command-prompt-without-newline ← Eryk again

So basically it sounds like you can specify any subset of the ascii control characters (0-31), and they basically act like extra end-of-line characters. Most or all of these characters can also be entered by users e.g. control-A → 0x01. That makes this a user-visible change – if you use 0x01 as your wakeup character, then anyone who hits control-A will cause ReadConsole to return immediately. We can detect what happened because we know whether we injected a fake control-A or not, but it terminates the line-editing mode anyway, so users could notice. The advantage of using VK_ENTER to cancel like libuv does is that users already expect this to terminate line-editing mode. But, I'm not sure if there's any way to type NUL, or characters 0x1c through 0x1f, so maybe they would work? (0x1a = control-Z, 0x1b = escape).

jeremysalwen commented 4 years ago

The careful thought put into the design here is much appreciated. However, I have only a simple question: Is the suggested workaround still

trio.to_thread.run_sync(blocking_io_command)

oremanj commented 4 years ago

Currently your choices for stdin are trio.to_thread.run_sync (all platforms, works fine except that it isn't cancellable, so if you use it for interactive stdin you won't be responsive to Ctrl+C), trio.lowlevel.FdStream(os.dup(0)) (Linux and macOS, creates problems if you continue to use the blocking sys.stdin), or trio.lowlevel.FdStream(os.open("/proc/self/fd/0", os.O_RDONLY)) (Linux only, should work fine in all cases I know of).