Closed bitprophet closed 7 years ago
@tuukkamustonen - what exact OS, shell, and versions of each are you seeing this bug under? I'm assuming it'll affect most/all POSIX shells, but you never know...
I'm running:
4.3.11(1)-release (x86_64-pc-linux-gnu)
.Also tried zsh (5.0.2
) and the result is similar.
Started trying to reproduce this locally...I can kinda do it but I actually get a nastier looking error when I try to fg
after the backgrounded process complains via shell job control. This is the same even back on 0.16.3 (i.e. before the changes from #425).
» python backgrounding.py &
[1] 62549
sleeping...
» running command...
[1] + suspended (tty output) python backgrounding.py
» fg
[1] + continued python backgrounding.py
jforcier
Traceback (most recent call last):
File "backgrounding.py", line 9, in <module>
run('whoami')
File "/Users/jforcier/Code/oss/invoke/invoke/__init__.py", line 38, in run
return Context().run(command, **kwargs)
File "/Users/jforcier/Code/oss/invoke/invoke/context.py", line 78, in run
return runner_class(context=self).run(command, **kwargs)
File "/Users/jforcier/Code/oss/invoke/invoke/runners.py", line 259, in run
return self._run_body(command, **kwargs)
File "/Users/jforcier/Code/oss/invoke/invoke/runners.py", line 353, in _run_body
raise ThreadException(thread_exceptions)
invoke.exceptions.ThreadException:
Saw 1 exceptions within threads (error):
Thread args: {'kwargs': {'echo': None,
'input_': <open file '<stdin>', mode 'r' at 0x10dc640c0>,
'output': <open file '<stdout>', mode 'w' at 0x10dc64150>},
'target': <bound method Local.handle_stdin of <invoke.runners.Local object at 0x10e515c90>>}
Traceback (most recent call last):
File "/Users/jforcier/Code/oss/invoke/invoke/util.py", line 156, in run
super(ExceptionHandlingThread, self).run()
File "/usr/local/var/pyenv/versions/2.7.12/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/Users/jforcier/Code/oss/invoke/invoke/runners.py", line 630, in handle_stdin
with character_buffered(input_):
File "/usr/local/var/pyenv/versions/2.7.12/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
File "/Users/jforcier/Code/oss/invoke/invoke/platform.py", line 119, in character_buffered
tty.setcbreak(stream)
File "/usr/local/var/pyenv/versions/2.7.12/lib/python2.7/tty.py", line 36, in setcbreak
tcsetattr(fd, when, mode)
error: (4, 'Interrupted system call')
At a glance, my shell (zsh 5.2
on OS X Sierra; it's the system zsh, not from brew) is giving us a tiny bit more info than Tuukka's, with the suspended (tty output)
as explanation for the job control state change.
And well, yea - run('whoami')
would by default be mirroring the subprocess' stdout/err to the controlling TTY's stdout/err. So at one level maybe this isn't even technically a bug, but "backgrounding something trying to write to stdout doesn't work well generally"? But I suspect that's not the end of it by any means.
There's a few ways I can go from here, brainstorming:
(inv blah)& wait
as opposed to simply inv blah &
tty.setcbreak
on them...then again, that presumably occurred on the foregrounding where the TTY is present...)
^Z
+ bg
? Probably the exact same problem.At a quick glance the Interrupted system call
is symbolic of receiving an EINTR
event from, I'm betting, one of our read or write calls. Not sure offhand whether it would've been interrupted while backgrounded (and we then only see the traceback upon foregrounding) or if the act of foregrounding is the trigger. (See note above about needing to grok all this more.)
Definitely need more instrumentation to log file to figure out exactly what's happening from the perspective of the Invoke process during this particular flow.
On my Trusty VM (and zsh 5.0.2
) I get the foregrounding behavior, again with suspended (tty output)
, but as with Tuukka I do not get an EINTR related traceback; instead the foregrounded process just continues happily.
(As an aside, unless it ends up being fixed by an overall fix, I will leave that OS X specific issue as another issue, as I expect this use case to be chiefly encountered on Linux-oriented build systems...)
Starting to poke more:
hide=True
, meaning no stream mirroring, to the run
calls makes no difference, so it's likely that this is related to our manipulation of stdin
and not our writes to stdout
/`stderr0.11.1
doesn't have the issue either way and that 0.12.0
does; 0.12 made lots of changes to stdout/err/in mirroring (including forcing stdin to be bytewise) in order to implement autoresponding, so...yea. it's somewhere in that big pile of stuff.platform.character_buffered
, and after a presumably successful not isatty(stream)
test, meaning the stream appears to be a TTY.
stream.isatty()
is returning True
, where stream
is the usual <open file '<stdin>', ...>
(aka a Python 2 sys.stdin
).tty.setcbreak
or tty.tcgetattr
calls is sufficient to make the shell get mad about things. Not really surprising though.sys.stdin
is backgrounded?" which brings me right back to "What exactly is backgrounding doing in the first place?"&!
(which is actually a mite odd, as i still see the output, so clearly the tty attachment is identical, simply the job stops being controllable via job control. It remains a child of zsh
too, instead of becoming a child of init
as I was expecting. Meh.)setcbreak
, and prior to that point, both character_buffered
, isatty(stdin)
and has_fileno(stdin)
- our current tests for tty-like things - say True
. So as I suspected they won't help as-is.
setcbreak
out; we just hang on tcsetattr
(not tcgetattr
, btw) and even at that point, stdin passes our tests.Note: while this was obvious from the messages my shell has been sending me, it's laid out very clearly in man zshall
:
A job being run in the background will suspend if it tries to read from the terminal.
So unless we can find a useful test to detect backgrounding & pre-emptively avoid touching stdin
, there may be no fix for this besides documentation/education. I.e. recommend that users run backgrounded processes with an explicitly disassociated stdin; because as expected, invoke-thing < /dev/null &
backgrounds fine and behaves exactly like the pre-0.12 use case, printing everything happily and never being paused.
Rummaged around on the stdin obj and in the os
module but didn't see anything clearly helpful...decided to google before rummaging around harder in eg termios
, and this SO post (because there's always an SO post...) shows that os.tcgetpgrp
may be the key:
os.getpgrp
does the same for the current Python processSure enough, a basic test:
» python -c "import os; print(os.getpgrp()); import sys; print(os.tcgetpgrp(sys.stdin.fileno()))"
8848
8848
» python -c "import os; print(os.getpgrp()); import sys; print(os.tcgetpgrp(sys.stdin.fileno()))" &
[1] 8881
8881
1338
[1] + done python -c
Note how the foreground test has same PGID, but background test shows stdin is attached to PID 1338 (which is the zsh process this is all running in).
Time to see if this can be put into practice inside Invoke...
Yup, pretty easy, seems to work as advertised, and now things work "correctly" again. Also, kind of expectedly, it fixes the fully-disowned use case as well.
I gotta figure out how to test this, it's an extremely persnickety thing to test automatically, esp given we like to run our tests via ourselves...which naturally run in a headless state...
@tuukkamustonen In the meantime I'd love a confirmation that the commit showing up above, also fixes this in your real world CI server use case, and not just this base case!
That sounded like a fun trip! :)
I can confirm your magic doing the trick - background tasks no longer halt in my local bash. No change in behavior in Jenkins (because it was already working there).
Also tried sub-shell/background invocation in Jenkins and it worked. I did not run the exact same commands as earlier (because I switched to GNU parallel
for concurrent execution), but I am not doing anything special there so I think your fix just works.
(Bash and zsh seem to report the completion of a background command in a slightly different way, but yeah, that's just difference in shells, not something related to invoke
.)
Awesome, glad to hear it!
Just gotta test this...I tried my hand at an integration level test yesterday but it was hard, as expected, due to not having a real terminal handy (which is compounded on Travis, to boot). Think I will just put in a unit-style test instead; less than ideal, but eh.
See bottom half of this comment as well as this later comment.
tl;dr something we're doing in Runner - I'm guessing some of the ioctl junk but not really sure - is messing with shell backgrounding such that the process has to be foregrounded again to continue execution.
It only occurs when a real pty is attached; headless processes (like the kind that used to break before #425's main issue was fixed) don't exhibit it. Which is why I'm guessing it's to do with the "passes isatty/has_fileno tests" code branches.
Issue started around 0.12.x in case that helps us narrow it down.