imageio / imageio-ffmpeg

FFMPEG wrapper for Python
BSD 2-Clause "Simplified" License
233 stars 52 forks source link

Hang in subprocess communicate #17

Closed christopherhesse closed 4 years ago

christopherhesse commented 5 years ago

I'm getting a weird hang when using this format in imageio (0.3.0).

py-spy --dump --pid 43792
Thread 0x7FA50A5B7700 (active)
         run (imageio_ffmpeg/_parsing.py:61)
         _bootstrap_inner (threading.py:917)
         _bootstrap (threading.py:885)
Thread 0x7FA57CB96700 (active)
         _communicate (subprocess.py:1701)
         communicate (subprocess.py:939)
         read_frames (imageio_ffmpeg/_io.py:193)
         _close (imageio/plugins/ffmpeg.py:342)
         close (imageio/core/format.py:252)
....

It looks like it's blocking forever trying to send "q" to the child process, should that have a timeout on it? I'm not sure what causes this issue but it seems to happen unreliably in my script. Creating a reduced version would be challenging because it is rare but the script runs for a long time.

I see that there is some code to just kill the subprocess, but that code is disabled.

christopherhesse commented 5 years ago

I'm going to try enabling that kill code and see if the problem goes away.

almarklein commented 5 years ago

Thanks for reporting this issue! IIRC I did use a timeout at first, but the sending of "q" proved to be very reliable. Your case seems to suggests that it is not :)

I'm going to try enabling that kill code and see if the problem goes away.

Any luck with that?

christopherhesse commented 5 years ago

I changed https://github.com/imageio/imageio-ffmpeg/blob/baa539dc53ef3163301088abcdfa8f509ef653e6/imageio_ffmpeg/_io.py#L192 to say if False and haven't seen any more hanging issues. The only issue is that it prints more kill warnings from line 208, but since I have warnings disabled that doesn't matter as much.

I would love a new version of ffmpeg with the SIGINT kill mode so I can remove my monkeypatch solution.

christopherhesse commented 5 years ago

If sending q is actually better in some objective way, it could use a timeout to communicate and fallback to sigint if that timeout is reached.

almarklein commented 5 years ago

That should work: https://docs.python.org/3.7/library/subprocess.html#subprocess.Popen.communicate

almarklein commented 5 years ago

cc @dennisvang, we seem to be a bit stuck here. Since you once proposed to use p.communicate(b'q') to terminate the ffmpeg process, I was hoping you might have some ideas :)

It looks like that call to communicate() is hanging. IFAIK that call does:

dennisvang commented 4 years ago

@almarklein, sorry for the very late reply.

I have been having some second thoughts about the communicate(b'q') call myself.

At the time, the communicate(b'q') approach seemed to fix my immediate problem (https://github.com/imageio/imageio/issues/343), and on my side it does work, in principle. However, since that time, I have also encountered blocking issues related to communicate that appear to be random.

In my case I don't see any indefinite blocking, but on occasion the communicate(b'q') call will block for several seconds (up to approx. 10 seconds). This seems to occur only with some usb-webcams, such as my laptop's builtin cam. I haven't seen the issue with some other webcams. Not sure what is causing it.

Looking at the source for communicate(), there is quite a lot going on there, including some additional threading.

Taking a step back: my reason for using the communicate() to send b'q' was that I wanted to play safe, taking into account the warning from the subprocess docs against using stdin.write with pipes.

Now, on second thought, that may not be necessary, considering that imageio is already reading from the pipe (see e.g. https://stackoverflow.com/a/30984882).

At this point I am playing with an alternative, which is to replace p.communicate(b'q') with:

p.stdin.write(b'q')
p.stdin.close()
p.wait()

The above is roughly equivalent to the python 2.7 implementation of communicate(), without the additional threads for reading stdout and stderr.

That appears to solve my specific problem, although I cannot yet be certain it does, and I'm not sure if it helps with the current issue (#17).

Note: stdin.close() also flushes (see https://docs.python.org/3/library/io.html#io.IOBase.close)

Note: the timeout approach seems like a good one too, but as I'm working with legacy Python 2.7 code, I haven't been able to test that. (the timeout argument was introduced in Python 3.3 I believe... edit: it is present in subprocess32...)

almarklein commented 4 years ago

@dennisvang thanks for the update!

almarklein commented 4 years ago

I merged #19 today, but that just makes ffmpeg getting killed most of the time. I dug around some more (e.g. what communicate does, and I think I've got it:

                    p.stdin.write(b"q")
                    p.stdin.close()
                    p.stdout.close()

The closing of the stdout seems to be the missing piece. In contrast to communicate this does not try to read from stdout, so no chance of hanging here.

Seems to work on both Windows and Linux. Will do some more testing and make it part of #29 tomorrow.

almarklein commented 4 years ago

Also see this comment in #20 to see why apparently p.communicate() should probably be avoided (here).