Closed stepelu closed 2 years ago
According to the logs these images have a Decode table, which works like a color filter applied to the image data. The basic use is an optimized monochrome image used to render color.
Ocrmypdf skips such images (and actually, any image that doesn't look simple and safe, because it is an archival tool) because they are not common and there are significant complexities to modifying them in a way that will not change the appearance.
I might be able to whitelist this particular case if the images are "simple" enough.
I understand. In the meantime I found another procedure that avoids the creation of these images with the Decode table (these originated by a pdftops for rasterization followed by ps2pf to get a pdf back). So I am closing the ticket, thanks for the prompt reply!
Describe the bug
Some images, which according to
pdfimages -list
are encoded as:do not get compressed.
To Reproduce
Consider the PDF file out.pdf, it contains 4 grayscale images on 2 pages.
Running ocrmypdf with
ocrmypdf --jbig2-lossy out.pdf ocr-out.pdf
do not result in a file with compressed images:The log of
ocrmypdf
with-v 1
is:It seems that these images are not considered as candidates for JBIG2 compression.
Example file
See above.
Expected behavior
The images are grayscale, so with default optimization level 1 should get compressed to lossy jbig2 (given the passed options). This happens for instance for images that are encoded as
ccitt
as reported frompdfimages -list
, i.e. this case works fine.System