torakiki / sambox

A PDFBox fork intended to be used as PDF processor for Sejda and PDFsam
Apache License 2.0
50 stars 19 forks source link

Handle invalid header #78

Closed fostersimonj closed 7 years ago

fostersimonj commented 7 years ago

Came across a PDF header with

2 J
%PDF-1.7
3 0 obj

Acrobat Reader and other libraries (eg poppler pdfinfo, Apache tika) handle this, but Sambox fails here since the first character on the first line is a digit

Caused by: java.io.IOException: Unable to find expected file header
    at org.sejda.sambox.input.PDFParser.readHeader(PDFParser.java:176)
torakiki commented 7 years ago

fixed and released