Closed jcallaha closed 2 years ago
After giving this some thought, I think this works as intended. PIL expects these bytes to be compressed (and indeed from the tests you can verify that PIL is able to extract an image from a PDF)
Hi Joris - thanks for the quick response!
I think now I see the full issue - in my case the stream is set as /Filter[/FlateDecode/DCTDecode] so the JPEG image is compressed. If the image stream is just DCTDecode then PIL does the right thing with the stream bytes directly - but in that case the transform is actually done by jpeg_image_transformer.py because is it added first and it is only looking for DCTDecode on the image stream.
It looks like the code in compressed_jpeg_image_transformer.py intends to handle the case of the double encoding because it returns True for can_be_transformed() if the Filter is either just "DCTDecode" or if it is an array of filters with DCTDecode at the end:
object["Filter"] == "DCTDecode"
or (
isinstance(object["Filter"], list)
and len(object["Filter"]) > 1
and object["Filter"][-1] == "DCTDecode"
)
I think the first case (object["Filter"] == "DCTDecode") is already handled by jpeg_image_transformer.py so the compressed_jpeg_image_transformer.py could probably be simplified. Not sure how to proceed - I wish I had a simple test case I could share.
After calling decode_stream() to deflate the image stream before handing it off to PIL, reference "DecodedBytes" instead of "Bytes" as "Bytes" is the original stream data instead of the decoded data.