py-pdf / pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
https://pypdf.readthedocs.io/en/latest/
Other
8.41k stars 1.42k forks source link

AttributeError: 'pageObject' object has no attribute 'has_key' #640

Closed gilramanoj closed 2 years ago

gilramanoj commented 3 years ago

pyPDF2 is throwing the attached error

pyPDF2

Actually, I need to list out broken links in the PDF file. Please suggest.

Regards, Manoj

Joshua-IRT commented 3 years ago

Which version of Python are you using? The .has_key method was removed in Python 3.0: https://docs.python.org/3.1/whatsnew/3.0.html#builtins

gilramanoj commented 3 years ago

Hi,

Thanks for your reply! I am using Python 3.10 version.

Regards, Manoj

From: Joshua @.> Sent: Tue, 05 Oct 2021 05:06:08 To: mstamy2/PyPDF2 @.> Cc: gilramanoj @.>, Author @.> Subject: Re: [mstamy2/PyPDF2] AttributeError: 'pageObject' object has no attribute 'has_key' (#640)    Which version of Python are you using? The .has_key method was removed in Python 3.0: https://docs.python.org/3.1/whatsnew/3.0.html#builtins — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

johns1c commented 2 years ago

Dear Manoj,

can you attach or give links to sample PDFs with ideally good and broken links and I will have a go at testing this.

Chris Johnson

pubpub-zz commented 2 years ago

@gilramanoj, Can you please provide PDF and code to investigate. Thanks

MasterOdin commented 2 years ago

Closing this as it's an issue in user's code, not PyPDF2 itself.

As @Joshua-IRT indicates, the .has_key method on dictionary like objects (which PageObject is) was removed as part of python3 as it's recommended to use the in operator instead.

An example:

>>> from PyPDF2 import PdfReader
>>> pdf = PdfReader('PDF_Samples/GeoBase_NHNC1_Data_Model_UML_EN.pdf')
>>> pageObject = pdf.pages[0]
>>> pageObject
{'/Type': '/Page', '/Parent': IndirectObject(2, 0), '/Resources': {'/Font': {'/F1': IndirectObject(5, 0), '/F2': IndirectObject(8, 0), '/F3': IndirectObject(10, 0), '/F4': IndirectObject(12, 0), '/F5': IndirectObject(17, 0)}, '/XObject': {'/Image7': IndirectObject(7, 0), '/Image21': IndirectObject(21, 0)}, '/ProcSet': ['/PDF', '/Text', '/ImageB', '/ImageC', '/ImageI']}, '/Annots': [IndirectObject(19, 0), IndirectObject(20, 0)], '/MediaBox': [0, 0, 612, 792], '/Contents': IndirectObject(4, 0), '/Group': {'/Type': '/Group', '/S': '/Transparency', '/CS': '/DeviceRGB'}, '/Tabs': '/S', '/StructParents': 0}
>>> 'foo' in pageObject
False
>>> '/Type' in pageObject
True