Closed philip-bl closed 1 year ago
create_tiff_file.py
#!/usr/bin/env python
import numpy as np
import PIL.Image as Image
# Create really big file so it will take time to load. But even bigger file will cause libtiff to complain that we are trying to zip-bomb it
Image.fromarray(np.random.rand(5000, 30000) > 0.9) \
.save("example.tif", format="TIFF", compression="tiff_lzw")
reproduce.py
#!/usr/bin/env python
from threading import Thread
import os
from collections import deque
finish = False
def load_files():
descriptors = deque()
while not finish:
try:
fd = os.open("test", os.O_RDONLY)
descriptors.append(fd)
except: # in case of too many descriptors
os.close(descriptors.pop())
for d in descriptors:
os.close(d)
print("success")
t = Thread(target=load_files)
import PIL.Image as Image
print("Right before the with-block", flush=True)
with Image.open("example.tif") as mask:
print("Right in the beginning of the with-block", flush=True)
t.start()
mask.load() # replace it with time.sleep(10) and it works just fine without exceptions
print("Right after load()", flush=True)
print("Right after the with-block", flush=True)
finish = True
t.join()
Steps to reproduce:
touch test
./create_tiff_file.py
./reproduce.py
Note: you need to delete example.tif and test before every iteration of running this. otherwise the OS might cache the files and thus the problem won't reproduce because libtiff will load file too fast
Expected behavior:
load_files
outputs "success"
Actual behavior:
Right in the beginning of the with-block
Right after load()
Right after the with-block
Exception in thread Thread-1 (load_files):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "reproduce.py", line 17, in load_files
os.close(d)
OSError: [Errno 9] Bad file descriptor
tiff loader creates file duplicate of file descriptor for use in external c library code https://github.com/python-pillow/Pillow/blob/5832362215f5fa661fa05ad31e6c97aab87d410a/src/PIL/TiffImagePlugin.py#L1256
As comments say "libtiff closes the file descriptor, so pass in a dup.", libtiff probably closes fp
after decoding here
https://github.com/python-pillow/Pillow/blob/5832362215f5fa661fa05ad31e6c97aab87d410a/src/PIL/TiffImagePlugin.py#L1298
Pillow closes fp
again
https://github.com/python-pillow/Pillow/blob/5832362215f5fa661fa05ad31e6c97aab87d410a/src/PIL/TiffImagePlugin.py#L1308
If another thread decides to open new file between steps 2 and 3 (like load_files
in example), it may get file descriptor that was just closed by libtiff. Then, in step 3, pillow closes this file, making descriptor invalid
This fixes the bug, but I'm not sure that libtiff will always close the descriptor, even if some error occurs
diff --git a/src/PIL/TiffImagePlugin.py b/src/PIL/TiffImagePlugin.py
index 3d4d0910a..d5ea99075 100644
--- a/src/PIL/TiffImagePlugin.py
+++ b/src/PIL/TiffImagePlugin.py
@@ -1296,6 +1296,8 @@ class TiffImageFile(ImageFile.ImageFile):
self.fp.seek(0)
# 4 bytes, otherwise the trace might error out
n, err = decoder.decode(b"fpfp")
+ # fp should be closed after call to decode
+ fp = None
else:
# we have something else.
logger.debug("don't have fileno or getvalue. just reading")
Because it's unclear if libtiff will close the file descriptor or not, what about this as an alternative solution? It checks if the file descriptor is open first, by trying to read 0 bytes.
I've tested it against create_tiff_file.py and reproduce.py and it works.
diff --git a/src/PIL/TiffImagePlugin.py b/src/PIL/TiffImagePlugin.py
index 3d4d0910a..97d184f5f 100644
--- a/src/PIL/TiffImagePlugin.py
+++ b/src/PIL/TiffImagePlugin.py
@@ -1305,6 +1305,9 @@ class TiffImageFile(ImageFile.ImageFile):
if fp:
try:
+ # Test if the file descriptor is still open
+ os.read(fp, 0)
+
os.close(fp)
except OSError:
pass
I reproduced the problem with this fix. Unfortunately, reproduction code doesn't work in all cases, and I don't know how to make a better version without hooking something in libtiff. I can only suggest storing file "test" on some very slow drive to ensure read is really slow
How this can still lead to a bug:
os.dup
on self.fd
and get descriptor, let's say 4close(4)
os.read(4, 0)
on file descriptor and it successfully reads that other fileI checked libtiff code and it should always close the descriptor after performing decode. It will stay open only if libtiff contains a bug (but this will probably lead to much more serious errors anyway)
https://github.com/python-pillow/Pillow/blob/main/src/libImaging/TiffDecode.c#L723
If libtiff may not close the fd in some cases, the proper fix would be to either:
As to why we are closing the file pointer ourselves in the first place, it is because of #5936
As to why we're duping the file pointer in the firster place, I think the original logic was:
One option was to use python/our code for all the byte shuffling at the file level. IIRC there was a pretty decent performance and complexity improvement for cases when we did have a file pointer to be able to send that into libtiff and let it manage all of that.
(edit: yeah, that's definitely my comment style)
I met the same problem. Agree with @nurumaik. A workaround is changing the code from
im = Image.open(file_path)
to
im = Image.open(io.BytesIO(open(file_path, "rb").read()))
@crazyyao0 do you have a different way to reproduce the bug that you could share in a self-contained example?
To be clear, applying the suggested fix from https://github.com/python-pillow/Pillow/issues/7042#issuecomment-1489020934 does cause the scenario from #5936 to fail, which would otherwise have passed.
I've created PR #7199 to stop duplicating the file pointer in the first place. Does that resolve this?
I don't think this issue is fixed completely. I can still reproduce the "Bad file descriptor" error both on Windows and Linux in heavily multi-threaded applications, e.g. with the following script:
import argparse
import tempfile
import threading
from PIL import Image
parser = argparse.ArgumentParser()
parser.add_argument('filename')
args = parser.parse_args()
def test():
with Image.open(args.filename) as im, tempfile.NamedTemporaryFile('wb') as f:
im.save(f, format="TIFF", compression="tiff_lzw")
threads = []
for i in range(32):
t = threading.Thread(target=test, daemon=True)
t.start()
threads.append(t)
for t in threads:
t.join()
Run it with the attached example tiff (had to gzip it here so that Github would accept it) 3-pages.tiff.gz and you will see many Bad file descriptor
error messages.
It seems that the save()
method is responsible for the double fd close here.
I tested it with the latest stable release 10.4.0 with the binary wheels from PyPI. Maybe increasing the number of threads could help?
Since it's a race-condition, the issue doesn't happen on every run. On my local machine maybe half the time. This is what repated runs look like on Windows with Python 3.12 and Pillow 10.4.0:
I've created #8458 to address this.
Here are some wheels for you to test if you would like to verify the fix - wheels.zip
Looks good to me, I tested with the provided pillow-11.0.0.dev0-cp312-cp312-win_amd64.whl
and couldn't reproduce the issue anymore.
If I open an LZW-compressed binary TIFF file using pillow and load its data, pillow calls the "close" syscall on one of the two relevant file descriptors twice. The second call results in a "EBADF (Bad file descriptor)" error. The program doesn't crash and everything seems to work ok. But if multithreading is used and a race condition happens, this might crash the program.
In this example, the expected behaviour would be that there is only one "close" syscall on each of the two relevant file descriptors and that there is no "EBADF" error. Below, I show how to reproduce the problem.
Also, I want to note that this caused my training of a neural network to crash once. It happened because Pytorch's
DataLoader
uses multithreading and a race condition happened.