Closed rbrito closed 1 year ago
I have not read the specs (so, I'm not sure if I am talking something "legal" or "illegal"), but the CCITT images in the file above have their dictionary with /Type /Xobject
(with lowercase o
, instead of /XObject
) and removing those key-value pairs make pdfsizeopt compress the corresponding image as JBIG2.
You can cherry-pick a very ugly (but working) hack is currently at: https://github.com/rbrito/pdfsizeopt/commit/5dfacc75ab39e263885db3de249d20676989baed
Regards,
Rogério Brito.
Added:
pdfinfo orlin.pdf
for complete report, please.
Added:
pdfinfo orlin.pdf
for complete report, please.
I don't know if you are asking me to provide this (the file is readily available here in my original report), but here you go:
$ pdfinfo orlin.pdf
Creator:
Producer:
Tagged: no
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 29
Encrypted: no
Page size: 616.32 x 794.88 pts
Page rot: 0
File size: 1987849 bytes
Optimized: no
PDF version: 1.5
And just to be complete, here goes the output of pdfimages before and after a run with my patched version of pdfsizeopt:
$ pdfimages -list orlin.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 3424 4416 gray 1 1 ccitt no 78 0 401 400 5252B 0.3%
2 1 image 3424 4416 gray 1 1 ccitt no 80 0 401 400 6279B 0.3%
3 2 image 3424 4416 gray 1 1 ccitt no 82 0 401 400 44.7K 2.4%
4 3 image 3456 4448 gray 1 1 ccitt no 84 0 400 400 97.6K 5.2%
5 4 image 3456 4448 gray 1 1 ccitt no 86 0 400 400 60.8K 3.2%
6 5 image 3424 4416 gray 1 1 ccitt no 88 0 401 400 84.7K 4.6%
7 6 image 3424 4416 gray 1 1 ccitt no 90 0 401 400 70.8K 3.8%
8 7 image 3424 4416 gray 1 1 ccitt no 92 0 401 400 62.8K 3.4%
9 8 image 3424 4416 gray 1 1 ccitt no 94 0 401 400 67.3K 3.6%
10 9 image 3424 4416 gray 1 1 ccitt no 96 0 401 400 59.4K 3.2%
11 10 image 3424 4416 gray 1 1 ccitt no 98 0 401 400 63.9K 3.5%
12 11 image 3424 4416 gray 1 1 ccitt no 100 0 401 400 51.7K 2.8%
13 12 image 3424 4416 gray 1 1 ccitt no 102 0 401 400 51.1K 2.8%
14 13 image 3424 4416 gray 1 1 ccitt no 104 0 401 400 50.5K 2.7%
15 14 image 3424 4416 gray 1 1 ccitt no 106 0 401 400 63.2K 3.4%
16 15 image 3424 4416 gray 1 1 ccitt no 108 0 401 400 54.2K 2.9%
17 16 image 3424 4416 gray 1 1 ccitt no 110 0 401 400 47.6K 2.6%
18 17 image 3424 4416 gray 1 1 ccitt no 112 0 401 400 44.6K 2.4%
19 18 image 3424 4416 gray 1 1 ccitt no 114 0 401 400 93.4K 5.1%
20 19 image 3424 4416 gray 1 1 ccitt no 116 0 401 400 79.3K 4.3%
21 20 image 3424 4416 gray 1 1 ccitt no 118 0 401 400 86.6K 4.7%
22 21 image 3456 4448 gray 1 1 ccitt no 120 0 400 400 72.8K 3.9%
23 22 image 3424 4416 gray 1 1 ccitt no 122 0 401 400 81.9K 4.4%
24 23 image 3424 4416 gray 1 1 ccitt no 124 0 401 400 88.6K 4.8%
25 24 image 3424 4416 gray 1 1 ccitt no 126 0 401 400 83.6K 4.5%
26 25 image 3424 4416 gray 1 1 ccitt no 128 0 401 400 73.9K 4.0%
27 26 image 3424 4416 gray 1 1 ccitt no 130 0 401 400 80.0K 4.3%
28 27 image 3424 4416 gray 1 1 ccitt no 132 0 401 400 84.4K 4.6%
29 28 image 3424 4416 gray 1 1 ccitt no 134 0 401 400 30.7K 1.7%
$ pdfimages -list orlin.pso.pdf
page num type width height color comp bpc enc interp object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
1 0 image 3424 4416 gray 1 1 jbig2 no 78 0 401 400 4060B 0.2%
2 1 image 3424 4416 gray 1 1 jbig2 no 80 0 401 400 4992B 0.3%
3 2 image 3424 4416 gray 1 1 jbig2 no 82 0 401 400 34.2K 1.9%
4 3 image 3456 4448 gray 1 1 jbig2 no 84 0 400 400 73.1K 3.9%
5 4 image 3456 4448 gray 1 1 jbig2 no 86 0 400 400 46.3K 2.5%
6 5 image 3424 4416 gray 1 1 jbig2 no 88 0 401 400 64.1K 3.5%
7 6 image 3424 4416 gray 1 1 jbig2 no 90 0 401 400 53.7K 2.9%
8 7 image 3424 4416 gray 1 1 jbig2 no 92 0 401 400 47.5K 2.6%
9 8 image 3424 4416 gray 1 1 jbig2 no 94 0 401 400 50.7K 2.7%
10 9 image 3424 4416 gray 1 1 jbig2 no 96 0 401 400 45.1K 2.4%
11 10 image 3424 4416 gray 1 1 jbig2 no 98 0 401 400 47.1K 2.6%
12 11 image 3424 4416 gray 1 1 jbig2 no 100 0 401 400 38.9K 2.1%
13 12 image 3424 4416 gray 1 1 jbig2 no 102 0 401 400 38.0K 2.1%
14 13 image 3424 4416 gray 1 1 jbig2 no 104 0 401 400 38.5K 2.1%
15 14 image 3424 4416 gray 1 1 jbig2 no 106 0 401 400 46.8K 2.5%
16 15 image 3424 4416 gray 1 1 jbig2 no 108 0 401 400 40.0K 2.2%
17 16 image 3424 4416 gray 1 1 jbig2 no 110 0 401 400 35.3K 1.9%
18 17 image 3424 4416 gray 1 1 jbig2 no 112 0 401 400 34.0K 1.8%
19 18 image 3424 4416 gray 1 1 jbig2 no 114 0 401 400 69.9K 3.8%
20 19 image 3424 4416 gray 1 1 jbig2 no 116 0 401 400 59.8K 3.2%
21 20 image 3424 4416 gray 1 1 jbig2 no 118 0 401 400 65.0K 3.5%
22 21 image 3456 4448 gray 1 1 jbig2 no 120 0 400 400 54.3K 2.9%
23 22 image 3424 4416 gray 1 1 jbig2 no 122 0 401 400 62.4K 3.4%
24 23 image 3424 4416 gray 1 1 jbig2 no 124 0 401 400 67.3K 3.6%
25 24 image 3424 4416 gray 1 1 jbig2 no 126 0 401 400 62.1K 3.4%
26 25 image 3424 4416 gray 1 1 jbig2 no 128 0 401 400 55.0K 3.0%
27 26 image 3424 4416 gray 1 1 jbig2 no 130 0 401 400 60.3K 3.3%
28 27 image 3424 4416 gray 1 1 jbig2 no 132 0 401 400 64.2K 3.5%
29 28 image 3424 4416 gray 1 1 jbig2 no 134 0 401 400 23.2K 1.3%
@rbrito add issues
(Settings) in you fork, please.
@rbrito, hello.
There is a PDF in which the illustrations lack /Type/XObject
, but /Subtype /Image
is present. Can this be used?
Hi, @zvezdochiot.
@rbrito add
issues
(Settings) in you fork, please.
Just did that. Please, feel free to submit things there... On the other hand, if you can, please try to reproduce the problem with @pts's version before you file a bug in my repository (the idea is to have issues filed to my repository only if they belong to my own work and not in all copies).
Regarding your second question, I don't have the code here, but I would like to see one such PDF file and see what to do.
Regards,
Rogério Brito.
@rbrito say:
I would like to see one such PDF file and see what to do.
Fixed in 88263ef67fbd7478cf4ed29708f49043343a0fa5.
@zvezdochiot, please file a separate issue if you have a PDF file which pdfsizeopt breaks.
Dear @pts,
I found a file that has CCITT images that the latest
pdfsizeopt
doesn't extract/optimize.Here is the output of running
pdfsizeopt
on such file:Here are the images that are contained in that file:
I'm attaching the file in question here.
orlin.pdf
Thanks,
Rogério Brito.