Closed feinerer closed 7 years ago
I'll add a check for this case in the next release.
The PDF is missing a data field that is strictly optional, but almost never omit, and the third party PyPDF2 library does not handle this.
Try re-frying the PDF with Ghostscript as this would likely insert the expected object. Note this constructs a visually identical PDF and will reencode JPEGs in the process.
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=out.pdf in.pdf
Confirmed: using Ghostscript to rewrite the PDF suffices so that PyPDF2 can handle it.
A direct check in OCRmyPDF is appreciated to avoid the manual Ghostscript call.
This happens on a Debian Jessie system running the latest Docker container (see above command line).
Unfortunately I cannot include the corresponding PDF as it contains private information.
If you need further information, please give me instructions in order to help you debug this issue. Thank you!