When running the following code with the latest pypi version of pdfrw on the attached input results in an unexpected OverflowError:
import sys
from pdfrw import PdfReader
with open(sys.argv[1], 'rb') as f:
PdfReader(io.BytesIO(f.read()))
$ python3 pypdf2_repro.py ../test.pdf
[WARNING] pdfreader.py:581 PDF header not at beginning of file
[WARNING] pdfreader.py:599 Extra data at end of file
Traceback (most recent call last):
File "pdfreader_repro.py", line 6, in <module>
PdfReader(io.BytesIO(f.read()))
File "/home/user/.local/lib/python3.8/site-packages/pdfrw/pdfreader.py", line 619, in __init__
trailer, is_stream = self.parsexref(source)
File "/home/user/.local/lib/python3.8/site-packages/pdfrw/pdfreader.py", line 453, in parsexref
tok = next()
File "/home/user/.local/lib/python3.8/site-packages/pdfrw/tokens.py", line 88, in _gettoks
for match in findtok(fdata, current[0][1]):
OverflowError: Python int too large to convert to C ssize_t
$
When running the following code with the latest pypi version of pdfrw on the attached input results in an unexpected
OverflowError
:test.pdf