py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.14k stars 1.39k forks source link

OSError: cannot write mode PA as PNG #1961

Closed JeevansSP closed 1 year ago

JeevansSP commented 1 year ago

Was trying to iterate through the images in a page but was running into the below error

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Windows-10-10.0.22621-SP0

$python -c "import PyPDF2;print(PyPDF2.__version__)"
2.12.1

Code + PDF

This is a minimal, complete example that shows the issue:

def test():
    pdf_reader = PdfReader("path/to/above/pdff")

    page_content = {}
    for idx, page in enumerate(pdf_reader.pages):

        for image in page.images:
            page_content[idx]["images"].append(image.data)

    return page_content

test()

Share here the PDF file(s) that cause the issue. The smaller they are, the better. Let us know if we may add them to our tests! handsample.pdf

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "D:\xtransmatrix\ocr_summarizer_poc\env\lib\site-packages\PIL\PngImagePlugin.py", line 1286, in _save
    rawmode, mode = _OUTMODES[mode]
KeyError: 'PA'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "d:\xtransmatrix\ocr_summarizer_poc\utils.py", line 279, in <module>
    test()
  File "d:\xtransmatrix\ocr_summarizer_poc\utils.py", line 273, in test
    for image in page.images:
  File "D:\xtransmatrix\ocr_summarizer_poc\env\lib\site-packages\PyPDF2\_page.py", line 481, in images
    extension, byte_stream = _xobj_to_image(x_object[obj])
  File "D:\xtransmatrix\ocr_summarizer_poc\env\lib\site-packages\PyPDF2\filters.py", line 621, in _xobj_to_image
    img.save(img_byte_arr, format="PNG")
  File "D:\xtransmatrix\ocr_summarizer_poc\env\lib\site-packages\PIL\Image.py", line 2432, in save
    save_handler(self, fp, filename)
  File "D:\xtransmatrix\ocr_summarizer_poc\env\lib\site-packages\PIL\PngImagePlugin.py", line 1289, in _save
    raise OSError(msg) from e
OSError: cannot write mode PA as PNG

TODO:

Need to convert the image to "RGB" before saving it in PyPDF2/filters.py

JeevansSP commented 1 year ago

Raised PR which fixes the issue #1962

pubpub-zz commented 1 year ago

Hi @JeevansSP Thanks for your contribution, but I think the issue has been already solved with the latest changes. Can you upgrade to 3.12.1 and confirm my results

note for other testers: the image flip is expected.

MartinThoma commented 1 year ago

Thank you for reporting the issue, but we will not work on it as PyPDF2 is deprected.

We focus now on pypdf and will not update PyPDF2.