jsvine / pdfplumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
MIT License
6.1k stars 625 forks source link

TypeError: unsupported operand type(s) for %: 'NoneType' and 'int' when trying to access PDF page objects #827

Closed thefirebanks closed 1 year ago

thefirebanks commented 1 year ago

Describe the bug

Running into this issue when opening a specific PDF file:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[21], line 2
      1 with pdfplumber.open(file) as pdf:
----> 2     print(pdf.pages)

File ~/opt/miniconda3/envs/py39_nlp/lib/python3.9/site-packages/pdfplumber/pdf.py:120, in PDF.pages(self)
    118 if pp is not None and page_number not in pp:
    119     continue
--> 120 p = Page(self, page, page_number=page_number, initial_doctop=doctop)
    121 self._pages.append(p)
    122 doctop += p.height

File ~/opt/miniconda3/envs/py39_nlp/lib/python3.9/site-packages/pdfplumber/page.py:94, in Page.__init__(self, pdf, page_obj, page_number, initial_doctop)
     92 self.page_number = page_number
     93 _rotation = resolve_all(self.page_obj.attrs.get("Rotate", 0))
---> 94 self.rotation = _rotation % 360
     95 self.page_obj.rotate = self.rotation
     96 self.initial_doctop = initial_doctop

TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'

Code to reproduce the problem

file = "sample_file.pdf"
with pdfplumber.open(file) as pdf:
    print(pdf.pages)

PDF file

sample_file.pdf

If you need to redact text in a sensitive PDF, you can run it through JoshData/pdf-redactor.

Expected behavior

I should be able to access the page objects from the PDF. I tried opening it with PyMuPDF and it works.

Actual behavior

Got the error message.

Environment

Additional context

Add any other context/notes about the problem here.

samkit-jain commented 1 year ago

Resolved in https://github.com/jsvine/pdfplumber/pull/811