Closed macdeport closed 6 months ago
Thanks for attaching the file. While it's sometimes possible to identify an issue by looking at QPDF JSON, in this particular case, the issue involves data in the original PDF. The original PDF is also probably malformed - it looks like there is a content stream that does not have the appropriate number of elements in a matrix, so at least some portion of it isn't going to render correctly.
You could try using Ghostscript to rewrite the PDF - maybe it can find a way to correct the issue or discard:
gs -q -sDEVICE=pdfwrite -o out.pdf in.pdf
PDF has many errors and there's no way to recover it. At the point of failure, there's supposed to be a 6-element coordinate matrix that sets up what to draw next, and only 3 elements are there. There's just no way to know what supposed to happen.
I added a more descriptive error message.
Describe the bug
How did you download and install the software?
MacPorts
(BTW not offered in the drop-down menu below...) Runocrmypdf bid\$pdf bid_.pdf
=> "crash" on this particular filebid$.pdf
Steps to reproduce
Files
bid-240430.json
How did you download and install the software?
PyPI (pip, poetry, pipx, etc.)
OCRmyPDF version
ocrmypdf 16.2.0
Relevant log output