py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.05k stars 1.39k forks source link

page.images leads to TypeError: argument of type 'NullObject' is not iterable #1737

Closed pubpub-zz closed 1 year ago

pubpub-zz commented 1 year ago

Discussed in https://github.com/py-pdf/pypdf/discussions/1732

Originally posted by **commuter77** March 21, 2023 I tried the sample code from https://pypdf2.readthedocs.io/en/stable/user/extract-images.html but I got TypeError: argument of type 'NullObject' is not iterable. Anyone has the same issue ? I confirmed I've an image in my pdf file. the error in details: ``` testpdf2.py", line 14, in num_of_images = len(page.images()) File "C:\mySoft\Python39\lib\site-packages\PyPDF2\_page.py", line 481, in images extension, byte_stream = _xobj_to_image(x_object[obj]) File "C:\mySoft\Python39\lib\site-packages\PyPDF2\filters.py", line 576, in _xobj_to_image data = x_object_obj.get_data() # type: ignore File "C:\mySoft\Python39\lib\site-packages\PyPDF2\generic\_data_structures.py", line 827, in get_data decoded._data = decode_stream_data(self) File "C:\mySoft\Python39\lib\site-packages\PyPDF2\filters.py", line 538, in decode_stream_data data = CCITTFaxDecode.decode(data, stream.get(SA.DECODE_PARMS), height) File "C:\mySoft\Python39\lib\site-packages\PyPDF2\filters.py", line 463, in decode parms = CCITTFaxDecode._get_parameters(decode_parms, height) File "C:\mySoft\Python39\lib\site-packages\PyPDF2\filters.py", line 441, in _get_parameters if CCITT.COLUMNS in decode_parm: TypeError: argument of type 'NullObject' is not iterable ```
MartinThoma commented 1 year ago

@pubpub-zz Thank you for fixing the issue :pray:

The fix was just merged to main. It will be released tomorrow in pypdf > 3.6.0

pubpub-zz commented 1 year ago

I've prepare a sample file getting rid of text in order to make the document more acceptable. tt1.pdf