Open twisted-trac opened 15 years ago
@glyph set owner to @jyknight |
---|
I don't think that setting an timer is a viable solution to this problem, mainly because itimers aren't available in any currently released version of python, and alarm
's resolution makes it unhelpful.
Signals are also have that inherent race condition that everybody's favorite bug talks about. Of course, it's pretty unlikely that you'll schedule an itimer, get swapped out and stop executing, then get scheduled such that your timer executes before you struggle your way on to the read()
call and then block forever, but it's still possible.
I think threads are more likely to yield a correct solution. Start two threads: one for reading, one for writing. Use os.read
to avoid file object concurrency problems. Leave them running as long as StandardIO
is active, and deliver the data to the main thread. You should be able to easily nuke a wedged thread and cleanly exit just by closing the appropriate FD.
(Reassigning to the reporter because while this poses an interesting intellectual challenge for me, I don't think that there's anything we can really do about terminal sharing - and we are talking about terminals here, because what other FD could you possibly find yourself sharing? My comment about '\x1b' was serious. If you don't own output stream, there's nothing you can do to prevent other programs from puking up their guts on your display and putting it into a completely undefined, unknowable state. Want to ask the terminal what state it's in with the vt100 interrogation protocol? Too bad! You reported the position but then some random forked program moved it around before you could use that information. More closely related to this problem: who says your parent didn't set O_NONBLOCK
on this FD? It created it, after all. No standard that I can find says that stdio needs to be blocking, it just usually is: our thread-based solution should be prepared to cope with other similarly naive programs setting O_NONBLOCK
without breaking. A more realistic solution to the larger problem here would be to write a feature-complete clone of bash and screen and get the world to adopt a Twisted-based shell for everything, then give every spawned process its own PTYs so that they can be independently managed. The underlying I/O model is just broken and there's pretty much nothing anyone can do about it.)
Automation removed owner |
---|
@njsmith commented |
---|
There's more discussion of this issue in https://github.com/python-trio/trio/issues/174, including some ideas that haven't been discussed here.
@glyph commented |
---|
Anybody want to take this on? Node is gonna beat us to a fix on this! ;)
https://github.com/nodejs/node/commit/b5dda32b8a78c16453bb7c228878e474b8cd3461
@jyknight commented |
---|
Also [http://homepages.tesco.net/J.deBoynePollard/FGA/dont-set-shared-file-descriptors-to-non-blocking-mode.html ] seems like a nice summary of the issue.
@jyknight commented |
---|
I agree that threads are the only way to do it. However, closing the fd is not a viable solution. 1) it doesn't appear to interrupt the syscall in progress, 2) it's really bad to do that in a multithreaded program where another thread could come along and open a new file with that fd number.
This demo program seems to demonstrate a method that might work to shut down the threads. The choice of signal is irrelevant, as long as it has a handler, and isn't SA_RESTART.
import threading, os, time,signal, ctypes, errno
signal.signal(signal.SIGCHLD, lambda *args: 1)
running = True
def t_func(fd):
global thread_id
thread_id = ctypes.pythonapi.pthread_self()
print "started"
data = ''
try:
while running:
print os.read(fd, 100)
except OSError, e:
if e.errno != errno.EINTR:
raise
print "ended"
t = threading.Thread(target=lambda : t_func(0))
t.start()
time.sleep(1)
running = False
ctypes.pythonapi.pthread_kill(thread_id, signal.SIGCHLD)
print "killed"
t.join()
print "joined"
@jyknight commented |
---|
PS: the scenario I ran into this had nothing at all to do with terminals. It's in a test suite.
There's a program which basically all it does is to send stdio to and from a socket. It uses nonblocking IO for this. There's a server, which this program is connecting to, which writes some stuff to stdout.
The first program (in this instance) gets run with stdin of a dedicated pipe, and stdout unchanged, so, pointing to the testrunner's log, which, just so happens to be a pipe. The output of this command is empty, or nearly so, the command is being run for its side-effects. But stdin and stdout both get set non-blocking.
The server then logs some data to stdout, and pukes its guts out with an unexpected EINTR.
I'm pretty sure the opposite problem was happening to me too: that in some case, the streams were getting set back to blocking out from under the nbio-expecting process, thus causing it to hang. But that's just a guess; I don't have a strace log of that happening.
This has just about nothing to do with terminals, except that it breaks terminals even worse than other things, as a terminal shares this flag between stdin, stdout, and stderr. So a t.i.stdio-using program which ever spawns a shell as a subprocess, sharing any of the streams (e.g. just stderr, as might be common...), is also totally broken. Because the shell will set the terminal (all of stdin, stdout, and stderr) back to blocking.
@itamarst commented |
---|
So we don't forget: while ticket #2259 is fixed, epoll reactor still doesn't support redirecting stdout or stderr to a file. Fixing this bug would solve that.
@exarkun commented |
---|
Fixing this bug would solve that.
If the fix takes the form of never putting stdio file descriptors into the reactor, at least. There might be other ways to fix it.
Also see #4429 for the remaining epollreactor issue.
It seems that O_NONBLOCK is a property of file descriptions, not of file descriptors. This means that it's incorrect for any code to ever set it on a file descriptor that it did not just freshly create with open(), socket(), etc.
t.i.stdio needs to instead do a workaround like http://cr.yp.to/unix/nonblock.html describes. That is: use select/etc as usual to find when the fd is probably readable/writable. Then, try to read/write it. However, if it turns out select lied, you need a way to escape from the blocking read/write. So, you have to set a timer to send a signal to yourself (e.g. SIGALARM) around the read/write, so that it interrupts the syscall for you after a short period of time, in case select lied.
This is totally sucky but I can't see a correct alternative (other than using threads, like the windows version does).
Searchable metadata
``` trac-id__3442 3442 type__defect defect reporter__jknight jknight priority__normal normal milestone__ branch__ branch_author__ status__new new resolution__None None component__core core keywords__ time__1221683399000000 1221683399000000 changetime__1561586126289770 1561586126289770 version__None None owner__ cc__exarkun cc__itamar cc__jknight cc__glyph cc__ezyang cc__njs@... ```