Closed GoogleCodeExporter closed 9 years ago
What default behavior (of pdfsizeopt) would you expect in this case?
Original comment by pts...@gmail.com
on 8 Apr 2012 at 9:27
I think it's nearly impossible to avoid the ``optimized PDF bigger than
original'' case in general, because the original PDF might contain images or
other bulk data with a very cleverly optimized ZIP compression, and when
pdfsizeopt recompresses those objects (with ZIP), they become larger. If that
really bothers you, I can suggest a workaround: add a flag to pdfsizeopt
(disabled by default) so that it will use the original PDF if the optimized one
turns out to be larger. Please request this in another issue if you need that.
Another improvement would be maintaining a cache of (uncompressed, compressed)
stream data pairs, and reusing the compressed data if it's smaller than what
pdfsizeopt can produce. This has already been implemented for images. But even
implementing this wouldn't completely avoid the ``optimized PDF bigger than
original'', it would just make it more rare.
I've analyzed the example.pdf attached to your previous post. The reason why it
is smaller than the optimized one is that pdfsizeopt (with
--use-multivalent=no) can't generate object streams (/Type/ObjStm). Adding this
feature would be easy, it would solve the problem in this specific case, and it
would be a good general improvement. I'm narrowing the scope of this issue as a
feature request for that.
Original comment by pts...@gmail.com
on 10 Apr 2012 at 8:43
The original reported issue has been fixed in r183, which adds object stream
generation to pdfsizeopt:
$ ./pdfsizeopt.py --use-multivalent=no example.pdf
info: This is pdfsizeopt.py r183 size=292014.
info: loading PDF from: example.pdf
info: loaded PDF of 4093 bytes
info: found 22 obj offsets and 1 obj streams in xref stream
info: separated to 20 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 2 Type1C fonts loaded
info: saving PDF with 20 objs to: example.pso.pdf
info: generated object stream of 702 bytes in 13 objects (21%)
info: generated 4019 bytes (98%)
However, it's not fixed when Multivalent is enabled:
$ ./pdfsizeopt.py --use-multivalent=yes example.pdf
info: This is pdfsizeopt.py r183 size=292014.
info: loading PDF from: example.pdf
info: loaded PDF of 4093 bytes
info: found 22 obj offsets and 1 obj streams in xref stream
info: separated to 20 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 2 Type1C fonts loaded
info: writing Multivalent input PDF: pso.conv.mi.tmp.pdf
info: saving PDF with 20 objs to: pso.conv.mi.tmp.pdf
info: generated object stream of 702 bytes in 13 objects (21%)
info: generated 4019 bytes (98%)
info: executing Multivalent to optimize PDF: java -cp .../Multivalent.jar
-Djava.awt.headless=true tool.pdf.Compress -nopagepiece -noalt
pso.conv.mi.tmp.pdf
file:.../pso.conv.mi.tmp.pdf, 4019 bytes
PDF 1.5, producer=xdvipdfmx (0.7.8), creator= XeTeX output 2012.04.03:1909
additional compression may be possible with:
-compact
=> new length = 4818, saved -19%, elapsed time = 0 sec
info: Multivalent generated pso.conv.mi.tmp-o.pdf of 4839 bytes (120%)
info: compressed xref stream from 44 to 159 bytes (361%)
info: optimized to 4760 bytes after Multivalent (98%)
info: saving PDF to: example.psom.pdf
info: generated 4760 bytes (116%)
That's because Multivalent has decided not to emit an object stream this time.
I'm keeping the issue open until I implement a workaround for that (i.e.
pdfsizeopt will post-process the output of Multivalent, forcibly creating an
object stream).
Original comment by pts...@gmail.com
on 11 Apr 2012 at 9:05
I've just committed r185, which adds generates an object stream with
--use-multivalent=yes, even if Multivalent hasn't generated one.
Original comment by pts...@gmail.com
on 15 Apr 2012 at 1:31
Original comment by pts...@gmail.com
on 15 Apr 2012 at 1:32
As of r190 I've just submitted, pdfsizeopt tries all combinations of
--do-generate-xref-stream= and --do-generate-object-stream= for small files,
and picks the one with the smallest output size. This way the probability that
the optimized PDF is larger than the original is much higher in cases like the
example.pdf attached.
Again, thank you very much for reporting this issue, and providing the
necessary details, so I could investigate and prepare fixes. I close this issue
now. If you find something which is still wrong (or got wrong), please comment
on the issue, and I'll reopen it.
Original comment by pts...@gmail.com
on 15 Apr 2012 at 7:48
Original issue reported on code.google.com by
TTSten...@gmail.com
on 3 Apr 2012 at 5:11Attachments: