Closed geimist closed 2 years ago
You can use Ghostscript to fix this particular error that PyPDF2 created:
gs -q -sDEVICE=pdfwrite -o issue979_gs.pdf issue979.pdf
ocrmypdf -dcf issue979_gs.pdf _.pdf
At this point I won't take any further action because there's an easy workaround. I did not investigate the cause, but poppler pdfinfo
complains of:
Syntax Error: Can't get Fields array<0a>
In short this PDF is not well-formed and Ghostscript/OCRmyPDF are "within their rights" to reject it as noncompliant.
If this type of error starts popping up more commonly I'll investigate further and decide if it needs to be reported to either PyPDF2 or Ghostscript, and if OCRmyPDF should implement a shim to detect and resolve the issue in advance.
The current PyPDF2 version 2.3.1 works fine. Thank you for your support.
Describe the bug The input file is not processed. OCRmyPDF terminates with an error. "SubprocessOutputError: Ghostscript PDF/A rendering failed"
To Reproduce This error occurs when metadata is written in a PDF using PyPDF2. If for some reason the user wants OCRmyPDF to process the file again, this error occurs.
The error doesn't have to be OCRmyPDF, but I don't know how to avoid it.
Example file ocr_test_fehler.pdf
System
OCRmyPDF parameter:
Errormessage: