Closed ghost closed 3 years ago
Thanks for the report!
We will be looking into fixing this!
Thanks for the information @apisutilis, I'll take a detailed look into this one!
It seems like the error happens when the png reading function is trying to destroy the png reading structure after catching the error, that means that torchvision is catching the error, but it causes a segfault when calling png_destroy_read_struct
on
Which in turn calls https://github.com/glennrp/libpng/blob/a37d4836519517bdce6cb9d956092321eca3e73b/pngread.c#L948, where png_free
is an alias to free
. Therefore this error is related to memory management. I checked if big_row_buf
was NULL
, but it wasn't.
In my reproduction scenario, torchvision was able to load the image once, but the second call caused the segfault and produced the message libpng error: IDAT: bad parameters to zlib
. Which according to this issue https://github.com/ContinuumIO/anaconda-issues/issues/7315, it might be related to the version of zlib used when libpng is invoked. An user commented that the segfault occurred on the second call to libpng, which is the same scenario that we are having right now.
The proposed solution involves downgrading the zlib version (which I haven't verified myself). I'll try to compile ZLib as well as libpng to see if we can get more information.
@andfoy did you have the chance to look at this again?
@fmassa I haven't tried to compile Zlib locally, I'll give it a go tomorrow!
Closing, since with #4101 torchvision will now fail gracefully.
@fmassa should we open another issue to keep track of the progress on support for pngs with more than 8 bits ?
@NicolasHug yes, it would be good to have an issue to track supporting pngs with more than 8 bits.
🐛 Bug
torchvision.io.read_image()
will sometimes segfault or abort in other uncatchable ways on malformed images, rather than failing gracefully (e.g. with aRuntimeError
).To Reproduce
Steps to reproduce the behavior:
torchvision.io.read_image
:Expected behavior
I expected that trying to read an unsupported or malformed image would instead raise a
RuntimeError
or other catchable error so that it could be handled in code, rather than aborting.Environment
PyTorch version: 1.8.1+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final) CMake version: version 3.20.0
Python version: 3.8 (64-bit runtime) Is CUDA available: True CUDA runtime version: Could not collect GPU models and configuration: GPU 0: GeForce GTX 1050 Nvidia driver version: 460.67 cuDNN version: /usr/local/cuda-10.2/lib64/libcudnn.so.7.6.4 HIP runtime version: N/A MIOpen runtime version: N/A
Versions of relevant libraries: [pip3] numpy==1.20.1 [pip3] torch==1.8.1 [pip3] torchvision==0.9.1
Additional context
Something even more strange also happens with this particular image, which is that setting the
mode
toImageReadMode.RGB
will allow it to be read once, but attempting to read it a second time fails as above (i.e.torchvision.io.read_image
is not idempotent). I'm not sure if this behavior is unrelated, but whatever the root cause is, it would be nice to be able to just catch an error, e.g. to log the filename and skip the image during processing.Some quick investigation shows that the problematic images that exhibit this behavior are usually PNGs with a depth of 16 bits. OpenCV and PIL do not appear to have problems reading them.
Additionally, the error message changes sometimes, e.g. to
Segmentation fault
ordouble free or corruption (out)
.