py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.14k stars 1.39k forks source link

Wrong colors for some images with custom palette #2308

Closed stefan6419846 closed 5 months ago

stefan6419846 commented 10 months ago

Some images seem to be extracted with the wrong colors. Apparently the custom palette is not considered in these cases. The final byte string from

['/Indexed', '/DeviceRGB', 164, b'\xed\x1c$\xed\x1e&\xed \'\xed (\xed"*\xee#+\xee$,\xee&-\xee\'.\xee(/\xee(0\xee*2\xee,3\xee-4\xee.5\xee/6\xef07\xef29\xef3:\xef4;\xef5<\xef6=\xef7>\xef8?\xef9@\xef:A\xef;B\xf0<C\xf0=D\xf0>E\xf0@F\xf0AH\xf0BI\xf0CJ\xf0EL\xf0FL\xf0HN\xf1IP\xf1JQ\xf1KR\xf1LR\xf1NT\xf1PV\xf1RX\xf1TZ\xf2U[\xf2V\\\xf2X^\xf2Z`\xf2\\b\xf2^d\xf2af\xf3bh\xf3ej\xf3fk\xf3gl\xf3hm\xf3in\xf3jp\xf3lq\xf3ns\xf4ot\xf4pu\xf4rw\xf4sx\xf4uz\xf4v{\xf4w|\xf4x|\xf4y~\xf4z\x7f\xf5{\x80\xf5~\x82\xf5\x80\x84\xf5\x81\x86\xf5\x82\x87\xf5\x84\x88\xf5\x86\x8a\xf6\x88\x8c\xf6\x89\x8e\xf6\x8a\x8e\xf6\x8c\x90\xf6\x8e\x92\xf6\x90\x94\xf6\x92\x95\xf6\x92\x96\xf7\x94\x98\xf7\x97\x9b\xf7\x99\x9c\xf7\x9a\x9d\xf7\x9a\x9e\xf7\x9c\xa0\xf7\x9e\xa1\xf7\x9f\xa2\xf8\xa1\xa4\xf8\xa2\xa5\xf8\xa3\xa7\xf8\xa4\xa7\xf8\xa5\xa8\xf8\xa6\xa9\xf8\xa8\xab\xf8\xa9\xac\xf8\xaa\xad\xf8\xab\xae\xf8\xac\xaf\xf9\xae\xb1\xf9\xb0\xb3\xf9\xb2\xb4\xf9\xb3\xb6\xf9\xb4\xb7\xf9\xb5\xb8\xf9\xb8\xba\xf9\xba\xbc\xfa\xba\xbd\xfa\xbd\xbf\xfa\xbe\xc0\xfa\xc0\xc2\xfa\xc2\xc5\xfa\xc4\xc6\xfb\xc7\xc9\xfb\xc8\xca\xfb\xca\xcb\xfb\xca\xcc\xfb\xcc\xce\xfb\xce\xd0\xfb\xd0\xd1\xfb\xd1\xd2\xfb\xd2\xd3\xfb\xd2\xd4\xfc\xd3\xd5\xfc\xd4\xd6\xfc\xd6\xd7\xfc\xd7\xd8\xfc\xd8\xd9\xfc\xd9\xda\xfc\xdb\xdd\xfc\xdc\xde\xfc\xde\xdf\xfc\xdf\xe0\xfd\xe0\xe1\xfd\xe1\xe2\xfd\xe3\xe4\xfd\xe4\xe5\xfd\xe5\xe6\xfd\xe6\xe7\xfd\xe7\xe8\xfd\xe8\xe9\xfd\xe9\xea\xfd\xea\xea\xfd\xeb\xec\xfe\xec\xed\xfe\xed\xee\xfe\xee\xef\xfe\xef\xf0\xfe\xf1\xf1\xfe\xf3\xf3\xfe\xf3\xf4\xfe\xf4\xf5\xfe\xf5\xf6\xfe\xf6\xf6\xfe\xf8\xf8\xff\xfa\xfa\xff\xfb\xfc\xff\xfc\xfc\xff\xfe\xfe']

apparently gets ignored, as img.putpalette(color_space[-1]) will yield correct colors (red instead of the currenty gray returned by pypdf).

Note: This has been originally reported as (2) in https://github.com/py-pdf/pypdf/issues/2303#issuecomment-1823199023, but due to being a separate issue, I just isolated it.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.97-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.1, crypt_provider=('pycryptodome', '3.18.0'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

for index, page in enumerate(PdfReader('out1.pdf').pages):
    print(index, page)
    for key in page.images.keys():
        print(key)
        print(page.images[key].indirect_reference)
        page.images[key].image.convert("RGB").save("image.png")

The PDF file is the same as in https://github.com/py-pdf/pypdf/issues/2303#issue-2002452594, although it requires some additional work due to the EOD issue there.

pubpub-zz commented 5 months ago

@stefan6419846 I can't (anymore ?) reproduce this issue : the image is red colored . Can you confirm and close it if so ?

stefan6419846 commented 5 months ago

Yes, this seems to have indeed been fixed in the meantime. The top-down representation is unrelated to this case.

pubpub-zz commented 5 months ago

The top-down representation is unrelated to this case.

There is no issue about the top-down : the image is like this and is just flipped in the page

stefan6419846 commented 5 months ago

Yes, no worries - this has not been part of the ticket anyway ;)

pubpub-zz commented 5 months ago

however closing this issue lower the number of open issues below 75!🎉🎉🎉🎉