loris-imageserver / loris

Loris IIIF Image Server
Other
209 stars 87 forks source link

corrupted image converting .tiff #503

Open lsh-0 opened 4 years ago

lsh-0 commented 4 years ago

We have a .tiff image that is producing corrupted output.

Original: https://prod-elife-published.s3.amazonaws.com/digests/55692/digest-55692.tif

Corrupted: https://iiif.elifesciences.org/digests/55692%2Fdigest-55692.tif/full/full/0/default.webp

With the same type of corruption for .png and .jpg output formats. The corruption is different on each generation however we have caching turned on so you'll need to add request parameters to see that (/digests/55692%2Fdigest-55692.tif/full/full/0/default.webp?a=z).

We run loris from a Docker container so it's state is fairly fixed. You can pull the image from here: https://hub.docker.com/r/elifesciences/loris

And see it's definition here (forked from loris-docker): https://github.com/elifesciences/loris-docker

Version info:

bcail commented 4 years ago

$ mediainfo elife-digest-55692.tif General Complete name : elife-digest-55692.tif Format : TIFF File size : 214 KiB

Image Title : ... Format : LZW Width : 540 pixels Height : 523 pixels Color space : RGB Bit depth : 8 bits Compression mode : Lossless

I wonder if the LZW format is causing issues. @lsh-0 do you know if other LZW images are working for you, or are they all failing?

lsh-0 commented 4 years ago

Thanks for getting back to me. Here is another LZW compressed .tif, it appears to be working fine:

Original: https://prod-elife-published.s3.amazonaws.com/digests/53232/digest-53232.tif

IIIF: https://iiif.elifesciences.org/digests/53232%2Fdigest-53232.tif/full/full/0/default.webp

I have about 50 candidates I'm going to write a wee script to run through. All these images go through a review process and are then scrutinised on the site itself. If any are corrupted now it's likely they were working fine in the previous version of Loris (circa v2.2.0).

lsh-0 commented 4 years ago

None of the other candidates exhibited corruption, which is good.

I have a side-project using ImageMagick that will compare two images for differences with a fuzz factor and a threshold for passing. I intend to run through all of the our images and check for corruption that way. It would be good for our peace of mind to run this whenever we upgrade Loris and may even shake out more examples of this specific corruption.

Please let me know if you have any further suggestions you'd like me to investigate.

alexwlchan commented 4 years ago

Pulling different versions of the image from https://hub.docker.com/r/elifesciences/loris/tags:

So something about this commit introduced the issue to your build: https://github.com/elifesciences/loris-docker/commit/984cb31ea1d5416bdc228d8ba9d6bd68fc4f08a9

These are all the changes in Loris between 2.3.3 and 3.0: https://github.com/loris-imageserver/loris/compare/v2.3.3...v3.0.0

lsh-0 commented 4 years ago

This is helpful, thank you. I'll try upgrading the container to Ubuntu 20.04 on Thurs or Fri that has a newer version of libtiff in it (4.1 vs 4.0). I looked at it's changelog the other day but nothing jumped out at me.

alexwlchan commented 4 years ago

I think the issue isn’t libtiff; it might be the Python library Pillow.

Your working image had Pillow 4.3.0; your current image has 6.2.0. We used to pin the version of Pillow to avoid an issue with JPEG-compressed TIFs (see https://github.com/loris-imageserver/loris/pull/407, https://github.com/python-pillow/Pillow/issues/2926, https://github.com/loris-imageserver/loris/pull/485).

If I process your image with Pillow 7.0, I see the same corruption:

from PIL import Image

im = Image.open("digest-55692.tif")
im.save("digest-55692.jpg", quality=90)

So a short-term fix would be for you to pin the version of Pillow you use in your image (I can’t see where you install it in your Dockerfiles?).

I’ll have a look to see if the Pillow maintainers are aware of the issue.

lsh-0 commented 4 years ago

Your working image

Ah, it wasn't actually working. That was a new 2.3.3 installation. I was migrating Loris to a containerised installation. It seemed to go well until we started seeing corruption in a handful of new images resulting in this rushed upgrade to 3.0.0.

Our previous working version was a patched ~2.0 (?) era release? Not sure. We relied on it crashing to re-request a different source format.

lsh-0 commented 4 years ago

thanks for raising the issue with Pillow, @alexwlchan , it's appreciated.

lsh-0 commented 3 years ago

I can confirm that Loris 3.0.0, 3.2.0 and 3.2.1 with pillow==8.2.0 fixes this corruption issue.

edit: and it looks like there is a PR for that already!