Open drboone opened 5 years ago
Thank you.
The issue is that the images are marked as having a complex colorspace that ocrmypdf does not recognize, so it takes the precaution of assuming the colorspace is RGB and upgrades all of the images from monochrome to RGB.
You could work around this with pdfimages by outputting to monochrome and then repacking as a PDF.
Not sure when I'll be able to address this.
Yes, I rebuilt the PDF trivially. Thanks for looking!
Describe the issue I'm reporting this much larger output file as requested by the program. If I extract all of the scanned page images from the attached pdf using pdfimages, they come out as .pbm files. However, if I do the same to the pdf produced by ocrmypdf, they come out as .ppm files. Hopefully the attachmed pdf helps you track down whatever bizarre case I've managed to create.
To Reproduce
ocrmypdf "1st Solutions July 1985.pdf" out/"1st Solutions July 1985,pdf
Example file Culprit pdf is attached
Please check any or all that apply about the test file:
Expected behavior A clear and concise description of what you expected to happen. Include screenshots if applicable.
System:
Additional context Add any other context about the problem here. 1st Solutions July 1985.pdf