sylikc / pyexiftool

PyExifTool (active PyPI project) - A Python library to communicate with an instance of Phil Harvey's ExifTool command-line application. Runs one process with special -stay_open flag, and pipes data to/from. Much more efficient than running a subprocess for each command!
Other
161 stars 21 forks source link

Too many file errors cause library to hang #96

Open ninas opened 3 months ago

ninas commented 3 months ago

Running this on Arch Linux, with Exiftool 12.93

From playing around, this seems to happen at around 350 files with errors.

Repro:

files = lots_of_files_that_don't_exist
with exiftool.ExifToolHelper() as et:
     et.get_metadata(files)
# hangs indefinitely

With a lower number of file errors, the exception is thrown as expected, and the error message is provided.

Exiftool itself handles this case without issue. I'm guessing that perhaps it's something similar to the other issue I reported: https://github.com/sylikc/pyexiftool/issues/95 where it's either hanging waiting for input, or stuck in an infinite loop based on conditions not being met.

sylikc commented 2 months ago

Just like the other thread, can you use https://sylikc.github.io/pyexiftool/faq.html#i-m-getting-an-error-how-do-i-debug-pyexiftool-output to debug where exactly it's hanging... put some print() statements around the get_metadata call and after it... and outside of the with block... I'm wondering if something else is going on

I've definitely ran (on Windows) many file errors and didn't get any specific hang... but there could be something else going on

ninas commented 2 months ago

Output looks a lot like https://github.com/sylikc/pyexiftool/issues/95 again. And freezes on the following again: https://github.com/sylikc/pyexiftool/blob/e54f96cd75758096f72bc97c42390f1f9fef8010/exiftool/exiftool.py#L132

Sample code: https://gist.github.com/ninas/6f6b01a46b1eb7fd51d8df3af08ace97 I also tried a bunch of other ways of writing the code in case it was instantiating ExifToolHelper multiple times that was causing this issue (caching it, looping inside the with statement, etc.), but got the same result.

Log output with exiftool logging enabled and my print statements before and after the call: test.log

The number of files it takes before it gets stuck seems to do with how interleaved the existing and non-existing files are. As you can see in the log file, there's first a run that succeeds (contains existing and missing files), and then a second one that freezes (all missing files). If I move the existing files to the end of the file list, the first run will freeze.

Also, not sure if this is intentional, but in the cases where there are missing files and existing files, an exception is raised. The unexpected bit is that the exception's stdout is the exif results for the files that were successful. I'm not sure if this is documented anywhere (I couldn't find it in the docs, just came across it in practice).