twisted / twisted

Event-driven networking engine written in Python.
https://www.twisted.org
Other
5.46k stars 1.15k forks source link

t.i.stdio must not set nonblocking on stdio FDs #3442

Open twisted-trac opened 15 years ago

twisted-trac commented 15 years ago
jyknight's avatar @jyknight reported
Trac ID trac#3442
Type defect
Created 2008-09-17 20:29:59Z

It seems that O_NONBLOCK is a property of file descriptions, not of file descriptors. This means that it's incorrect for any code to ever set it on a file descriptor that it did not just freshly create with open(), socket(), etc.

t.i.stdio needs to instead do a workaround like http://cr.yp.to/unix/nonblock.html describes. That is: use select/etc as usual to find when the fd is probably readable/writable. Then, try to read/write it. However, if it turns out select lied, you need a way to escape from the blocking read/write. So, you have to set a timer to send a signal to yourself (e.g. SIGALARM) around the read/write, so that it interrupts the syscall for you after a short period of time, in case select lied.

This is totally sucky but I can't see a correct alternative (other than using threads, like the windows version does).

<foom> yeah, so t.i.stdio is totally broken, it can't set nonblocking on random fds it inherits, because that's a property of the file description not file descriptor.
<exarkun> Okay, that's a property of the file description.  So?
<foom> you run a t.i.stdio app. it sets stdout to nonblocking mode. Now, stdout is nonblocking for all processes that share that same stream.
<exarkun> How many people want to share stdout among multiple processes?
<foom> e.g. you return back to the shell, now the shell's stdout is nonblocking
<exarkun> That'd be easily fixed by making them blocking again before exiting
<foom> unless you don't exit cleanly
<foom> or unless someone else is running concurrently
<exarkun> If you don't exit cleanly, you might leave stdout in an inconsistent state anyway
<exarkun> And concurrency is a different case :)
<foom> you might? how?
<glyph> foom: sys.stdout.write("\x1b")
<foom> at least random other programs won't fail with EAGAIN
<exarkun> I've never wanted to programmatically interact with stdout that was in concurrent use by another process
<exarkun> That's why I said it was an uncommon case
<foom> you've never ran program & in a shell?
<foom> and had it print something?
<exarkun> not when `program´ wanted to play with stdout
<exarkun> well, not that's not true
<exarkun> I have
<exarkun> But whenever the program prints to stdout I get pissed up, foreground it and kill it.
<exarkun> Why would I want a background process talking to my stdio?  It's *background*!  The foreground process is the one that gets to talk to stdio.
<foom> so, in the meantime, since the shell printed something, it set the file back to blocking mode
<exarkun> Anyway, that's irrelevant - it's *possible* to share, I'm not denying that
<exarkun> But I really think it's very uncommon.
<foom> and now your twisted program is blocking on stdio
<foom> for the rest of its life
<foom> well, I just ran into this in real life.
<exarkun> Okay, so you probably want a fix :)
<foom> and it took quite a while to realize what the hell was going on
<exarkun> Before I was thinking that twisted.internet.stdio didn't need to change completely to accomodate an uncommon use-case
<foom> because I didn't even realize nonblockingness was a property of the file description
<exarkun> But now I'm thinking it can accomodate this use-case without changing completely
<exarkun> What if it could dup stdio and make the copies non-blocking?
<exarkun> Is that a fix?
<foom> no
<foom> dup doesn't copy the file description
<foom> it copies the file descriptor
<exarkun> okay
<foom> see, this is really evil.
<exarkun> shared mutable state usually is ;)
<itamar> what do other applications do?
<exarkun> itamar: I bet they break and die.
<exarkun> foom just said what shells do - set the fd back to blocking, but I bet that's not a reliable fix for them
<exarkun> (ie, what if twisted sets it non-blocking in between doing that and doing a read)
<foom> they do that in response to EAGAIN I believe
<foom> so it should work out okay for them. :)
<exarkun> around every read we do, we could lock the fd and set it non-blocking :)
<foom> lock it? how?
<exarkun> mandatory non-posix locking of course
<exarkun> (this is a joke)
<exarkun> bbiaf, coffee
<itamar> http://cr.yp.to/unix/nonblock.html
<foom> man, that sucks, but it sure would work.
<foom> select() would usually give the right answer, if it doesn't, the signal will break you out of the read.
<foom> it shouldn't even be that hard to fix t.i.stdio to do that.
<foom> sucksucksucksuck
<itamar> do we still want to set non-block?
<foom> no
<foom> you can't
<foom> it's basically incorrect to set nonblockingness on any fd you didn't create
<itamar> :(
Searchable metadata ``` trac-id__3442 3442 type__defect defect reporter__jknight jknight priority__normal normal milestone__ branch__ branch_author__ status__new new resolution__None None component__core core keywords__ time__1221683399000000 1221683399000000 changetime__1561586126289770 1561586126289770 version__None None owner__ cc__exarkun cc__itamar cc__jknight cc__glyph cc__ezyang cc__njs@... ```
twisted-trac commented 15 years ago
glyph's avatar @glyph set owner to @jyknight

I don't think that setting an timer is a viable solution to this problem, mainly because itimers aren't available in any currently released version of python, and alarm's resolution makes it unhelpful.

Signals are also have that inherent race condition that everybody's favorite bug talks about. Of course, it's pretty unlikely that you'll schedule an itimer, get swapped out and stop executing, then get scheduled such that your timer executes before you struggle your way on to the read() call and then block forever, but it's still possible.

I think threads are more likely to yield a correct solution. Start two threads: one for reading, one for writing. Use os.read to avoid file object concurrency problems. Leave them running as long as StandardIO is active, and deliver the data to the main thread. You should be able to easily nuke a wedged thread and cleanly exit just by closing the appropriate FD.

(Reassigning to the reporter because while this poses an interesting intellectual challenge for me, I don't think that there's anything we can really do about terminal sharing - and we are talking about terminals here, because what other FD could you possibly find yourself sharing? My comment about '\x1b' was serious. If you don't own output stream, there's nothing you can do to prevent other programs from puking up their guts on your display and putting it into a completely undefined, unknowable state. Want to ask the terminal what state it's in with the vt100 interrogation protocol? Too bad! You reported the position but then some random forked program moved it around before you could use that information. More closely related to this problem: who says your parent didn't set O_NONBLOCK on this FD? It created it, after all. No standard that I can find says that stdio needs to be blocking, it just usually is: our thread-based solution should be prepared to cope with other similarly naive programs setting O_NONBLOCK without breaking. A more realistic solution to the larger problem here would be to write a feature-complete clone of bash and screen and get the world to adopt a Twisted-based shell for everything, then give every spawned process its own PTYs so that they can be independently managed. The underlying I/O model is just broken and there's pretty much nothing anyone can do about it.)

twisted-trac commented 13 years ago
Automation's avatar Automation removed owner
twisted-trac commented 6 years ago
njsmith's avatar @njsmith commented

There's more discussion of this issue in https://github.com/python-trio/trio/issues/174, including some ideas that haven't been discussed here.

twisted-trac commented 4 years ago
glyph's avatar @glyph commented

Anybody want to take this on? Node is gonna beat us to a fix on this! ;)

https://github.com/nodejs/node/commit/b5dda32b8a78c16453bb7c228878e474b8cd3461

twisted-trac commented 12 years ago
itamarst's avatar @itamarst commented

sendmsg()/recvmsg() with MSG_DONTWAIT might be a possible solution for this.

twisted-trac commented 15 years ago
jyknight's avatar @jyknight commented

Also [http://homepages.tesco.net/J.deBoynePollard/FGA/dont-set-shared-file-descriptors-to-non-blocking-mode.html ] seems like a nice summary of the issue.

twisted-trac commented 15 years ago
jyknight's avatar @jyknight commented

I agree that threads are the only way to do it. However, closing the fd is not a viable solution. 1) it doesn't appear to interrupt the syscall in progress, 2) it's really bad to do that in a multithreaded program where another thread could come along and open a new file with that fd number.

This demo program seems to demonstrate a method that might work to shut down the threads. The choice of signal is irrelevant, as long as it has a handler, and isn't SA_RESTART.

import threading, os, time,signal, ctypes, errno

signal.signal(signal.SIGCHLD, lambda *args: 1)
running = True

def t_func(fd):
    global thread_id
    thread_id = ctypes.pythonapi.pthread_self()
    print "started"
    data = ''
    try:
        while running:
            print os.read(fd, 100)
    except OSError, e:
        if e.errno != errno.EINTR:
            raise
    print "ended"

t = threading.Thread(target=lambda : t_func(0))
t.start()

time.sleep(1)
running = False
ctypes.pythonapi.pthread_kill(thread_id, signal.SIGCHLD)
print "killed"
t.join()
print "joined"
twisted-trac commented 15 years ago
jyknight's avatar @jyknight commented

PS: the scenario I ran into this had nothing at all to do with terminals. It's in a test suite.

There's a program which basically all it does is to send stdio to and from a socket. It uses nonblocking IO for this. There's a server, which this program is connecting to, which writes some stuff to stdout.

The first program (in this instance) gets run with stdin of a dedicated pipe, and stdout unchanged, so, pointing to the testrunner's log, which, just so happens to be a pipe. The output of this command is empty, or nearly so, the command is being run for its side-effects. But stdin and stdout both get set non-blocking.

The server then logs some data to stdout, and pukes its guts out with an unexpected EINTR.

I'm pretty sure the opposite problem was happening to me too: that in some case, the streams were getting set back to blocking out from under the nbio-expecting process, thus causing it to hang. But that's just a guess; I don't have a strace log of that happening.

This has just about nothing to do with terminals, except that it breaks terminals even worse than other things, as a terminal shares this flag between stdin, stdout, and stderr. So a t.i.stdio-using program which ever spawns a shell as a subprocess, sharing any of the streams (e.g. just stderr, as might be common...), is also totally broken. Because the shell will set the terminal (all of stdin, stdout, and stderr) back to blocking.

twisted-trac commented 14 years ago
ezyang's avatar ezyang commented

Got bit by this bug. CC'ed.

twisted-trac commented 13 years ago
itamarst's avatar @itamarst commented

So we don't forget: while ticket #2259 is fixed, epoll reactor still doesn't support redirecting stdout or stderr to a file. Fixing this bug would solve that.

twisted-trac commented 13 years ago
exarkun's avatar @exarkun commented

Fixing this bug would solve that.

If the fix takes the form of never putting stdio file descriptors into the reactor, at least. There might be other ways to fix it.

Also see #4429 for the remaining epollreactor issue.