Closed tcurdt closed 1 month ago
The default behavior is to use Ghostscript to attempt PDF/A conversion. Sometimes, Ghostscript fails to produce a PDF/A and reverts to regular PDF instead. When OCRmyPDF notices this, is reports "seems to be No PDF/A metadata in XMP", that is, the file produced by Ghostscript does not have PDF/A metadata markers, even though we asked for this. Ghostscript is not always good at describing why it failed to produce PDF/A - suffice to say, the input PDF has some features that prevent PDF/A conversion, as far as Ghostscript is concerned.
As usual, no file means I can't give any specifics, make any recommendations, or fix any bugs. It's a bit like complaining that your web browser failed to render a web page, but you can't provide me a URI or even a screenshot. I'm very tempted to implement a policy of closing such issues without comment. If you're not willing to share information that is essential to fixing an issue, why bother reporting it?
I read the frustration from your reply. Sorry about that. I wish I could share the file but it has too much privacy relevant information. So a debug log was the next best thing.
Given that there are errors/improvements that might be unrelated to input files it would be a shame to restrict issues to have input files.
The way I read it, this really is a Ghostscript problem. I would have to ask them what Detected SMask which must be in DeviceGray, but we are not converting to DeviceGray, reverting to normal PDF output
means. And there isn't really much OCRmyPDF
can do about it anyway. Correct?
Maybe it could be good idea to change the messaging a bit. Instead of
Output file is okay but is not PDF/A (seems to be No PDF/A metadata in XMP)
maybe something along the lines of
It's a valid PDF but Ghostscript failed to convert it into a PDF/A
and maybe even list the error outside of -v1
.
I will improve the error message for the next release.
Describe the bug
I have read through https://github.com/ocrmypdf/OCRmyPDF/issues/490 but I still don't quite understand the message.
Since the target format is PDF/A (the default) - why does it not turn it into a PDF/A? What is preventing that?
It would be really nice to adjust the warning message to give a little more context.
It seems the reason shows with
-v1
:But what is a "SMask" and why does it need to be "DeviceGray"? And what does "No PDF/A metadata in XMP" expect to find?
Steps to reproduce
Files
No response
How did you download and install the software?
Homebrew
OCRmyPDF version
16.4.3
Relevant log output