Open SebastianDeiss opened 6 years ago
@jesparza I could submit a pull for this issue like https://github.com/jbremer/peepdf/pull/6, which is based on your master
.
google.com?q=filetype:pdf https://en.fh-westkueste.de/students/his/ These files created by HIS also produce an error. Could it be related?
An extended fix for the TypeErrors is now over at jbremer/peepdf#9.
I have also seen some of those HIS-generated PDFs (which originate from Apache FOP 2.3) and they only ran into the object stream parsing problem caused by PDFParser.readUntilSymbol()
resetting the buffer cursor fixed by commit 1 of that PR but not the TypeErrors. (That separate issue actually only exists in jbremer's fork.)
peepdf crashes with a
TypeError
if some PDFs are analyzed in force parsing mode andPDFObjectStream.resolveReferences()
is invoked.If I fix that
TypeError
by convertingoffset
atPDFCore.py:3243
to anint
object I get another one:A possible solution would be to supply the
PDFParser
object toPDFObjectStream
when creating that instance and then provide the suppliedPDFParser
instance forreadObject()
.