Closed SvennoNito closed 6 years ago
It's not exactly the same size, the output PDF is 92207 bytes smaller than the input PDF.
One possible reason is that the PDF contains only vector graphics (no bitmap images, no fonts). pdfsizeopt doesn't have an optimization algorithm for vector graphics, so it just keeps vector graphics intact (except for recompressing it).
Another possible reason is that the PDF file contains lots of TrueType or OpenType fonts. Again, pdfsizeopt doesn't have an optimization algorithm for these font types, so it just keeps these fonts intact.
Another possible reason is that there is bug in pdfsizeopt, and it doesn't notice some fonts or bitmap images that could be optimized.
To get a more accurate explanation, you may want to run pdfsizeopt --stats C:\pdfs\MA.pdf
and copy-paste the output to this bug, or upload the input PDF here.
A possible improvement would be adding an info message like this:
info: keeping X bytes in X context streams (vector graphics), X bytes in X TrueType/OpenType fonts, X bytes in X other objs intact
Thanks pts! My pdf includes no vector graphics but ~20 .png graphics all <1mb. When I run C:\pdfsizeopt\pdfsizeopt --stats C:\pdfs\MA.pdf
I get
info: This is pdfsizeopt ZIP rUNKNOWN size=68657.
info: computing statistics for PDF: C:\pdfs\MA.pdf
info: PDF size is 124154212 bytes
info: stat drawing_objs = 31707 bytes (0.03%)
info: stat font_data_objs = 0 bytes (0.00%)
info: stat footer = 26 bytes (0.00%)
info: stat header = 9 bytes (0.00%)
info: stat jpeg_image_objs = 124073372 bytes (99.93%)
info: stat linearized_xref = 0 bytes (0.00%)
info: stat nonjpeg_image_objs = 0 bytes (0.00%)
info: stat other_nonstream_objs = 29235 bytes (0.02%)
info: stat other_stream_objs = 0 bytes (0.00%)
info: stat trailer = 50 bytes (0.00%)
info: stat wasted_between_objs = 1 bytes (0.00%)
info: stat xref = 19812 bytes (0.02%)
info: end of stats
Which is interesting. I assume that jpeg_image_objs
of 99% means the size of the pdf comes from images?
Yes, this PDF contains many JPEG images (probably each page is one big image). pdfsizeopt doesn't contain any algorithm to make JPEG images smaller, so it just copies them around.
To make the PDF smaller, it would be possible to downscale (resize) the JPEG images, and/or to recompress them with a lower quality setting. However, pdfsizeopt is unable to do so (one reason for that is that pdfsizeopt does only visually lossless transformations by design), and it's unlikely that this feature gets introduced soon, except if someone volunteers to implement it.
Hey community, I'm trying to optimize a very large PDF (>100mb) on Windows using
C:\pdfsizeopt\pdfsizeopt C:\pdfs\MA.pdf C:\pdfs\MA_optimized.pdf
. It runs through, but the output file has the very same size as my input file. These are the info messages that I get. What do I do wrong?