Closed christopherhesse closed 4 years ago
I'm going to try enabling that kill code and see if the problem goes away.
Thanks for reporting this issue! IIRC I did use a timeout at first, but the sending of "q" proved to be very reliable. Your case seems to suggests that it is not :)
I'm going to try enabling that kill code and see if the problem goes away.
Any luck with that?
I changed https://github.com/imageio/imageio-ffmpeg/blob/baa539dc53ef3163301088abcdfa8f509ef653e6/imageio_ffmpeg/_io.py#L192 to say if False
and haven't seen any more hanging issues. The only issue is that it prints more kill warnings from line 208, but since I have warnings disabled that doesn't matter as much.
I would love a new version of ffmpeg with the SIGINT
kill mode so I can remove my monkeypatch solution.
If sending q is actually better in some objective way, it could use a timeout to communicate and fallback to sigint if that timeout is reached.
cc @dennisvang, we seem to be a bit stuck here. Since you once proposed to use p.communicate(b'q')
to terminate the ffmpeg process, I was hoping you might have some ideas :)
It looks like that call to communicate()
is hanging. IFAIK that call does:
@almarklein, sorry for the very late reply.
I have been having some second thoughts about the communicate(b'q')
call myself.
At the time, the communicate(b'q')
approach seemed to fix my immediate problem (https://github.com/imageio/imageio/issues/343), and on my side it does work, in principle.
However, since that time, I have also encountered blocking issues related to communicate
that appear to be random.
In my case I don't see any indefinite blocking, but on occasion the communicate(b'q')
call will block for several seconds (up to approx. 10 seconds). This seems to occur only with some usb-webcams, such as my laptop's builtin cam. I haven't seen the issue with some other webcams. Not sure what is causing it.
Looking at the source for communicate()
, there is quite a lot going on there, including some additional threading.
Taking a step back: my reason for using the communicate()
to send b'q'
was that I wanted to play safe, taking into account the warning from the subprocess docs against using stdin.write
with pipes.
Now, on second thought, that may not be necessary, considering that imageio is already reading from the pipe (see e.g. https://stackoverflow.com/a/30984882).
At this point I am playing with an alternative, which is to replace p.communicate(b'q')
with:
p.stdin.write(b'q')
p.stdin.close()
p.wait()
The above is roughly equivalent to the python 2.7 implementation of communicate()
, without the additional threads for reading stdout
and stderr
.
That appears to solve my specific problem, although I cannot yet be certain it does, and I'm not sure if it helps with the current issue (#17).
Note: stdin.close()
also flushes (see https://docs.python.org/3/library/io.html#io.IOBase.close)
Note: the timeout approach seems like a good one too, but as I'm working with legacy Python 2.7 code, I haven't been able to test that. (the timeout
argument was introduced in Python 3.3 I believe... edit: it is present in subprocess32
...)
@dennisvang thanks for the update!
I merged #19 today, but that just makes ffmpeg getting killed most of the time. I dug around some more (e.g. what communicate
does, and I think I've got it:
p.stdin.write(b"q")
p.stdin.close()
p.stdout.close()
The closing of the stdout
seems to be the missing piece. In contrast to communicate
this does not try to read from stdout, so no chance of hanging here.
Seems to work on both Windows and Linux. Will do some more testing and make it part of #29 tomorrow.
Also see this comment in #20 to see why apparently p.communicate()
should probably be avoided (here).
I'm getting a weird hang when using this format in imageio (0.3.0).
It looks like it's blocking forever trying to send "q" to the child process, should that have a timeout on it? I'm not sure what causes this issue but it seems to happen unreliably in my script. Creating a reduced version would be challenging because it is rare but the script runs for a long time.
I see that there is some code to just kill the subprocess, but that code is disabled.