pmaupin / pdfrw

pdfrw is a pure Python library that reads and writes PDFs
Other
1.86k stars 271 forks source link

Unexpected OverflowError on malformed PDF #210

Open Google-Autofuzz opened 3 years ago

Google-Autofuzz commented 3 years ago

When running the following code with the latest pypi version of pdfrw on the attached input results in an unexpected OverflowError:

import sys
from pdfrw import PdfReader

with open(sys.argv[1], 'rb') as f:
    PdfReader(io.BytesIO(f.read()))
$ python3 pypdf2_repro.py ../test.pdf
[WARNING] pdfreader.py:581 PDF header not at beginning of file
[WARNING] pdfreader.py:599 Extra data at end of file
Traceback (most recent call last):
  File "pdfreader_repro.py", line 6, in <module>
    PdfReader(io.BytesIO(f.read()))
  File "/home/user/.local/lib/python3.8/site-packages/pdfrw/pdfreader.py", line 619, in __init__
    trailer, is_stream = self.parsexref(source)
  File "/home/user/.local/lib/python3.8/site-packages/pdfrw/pdfreader.py", line 453, in parsexref
    tok = next()
  File "/home/user/.local/lib/python3.8/site-packages/pdfrw/tokens.py", line 88, in _gettoks
    for match in findtok(fdata, current[0][1]):
OverflowError: Python int too large to convert to C ssize_t
$ 

test.pdf