pts / pdfsizeopt

PDF file size optimizer
GNU General Public License v2.0
750 stars 65 forks source link

Problems processing fonts from a PDF #135

Closed rbrito closed 1 year ago

rbrito commented 4 years ago

Dear @pts,

There is a book that was just recently released from Springer as an open book and trying to run pdfsizeopt on it (your version, no modifications, your supported ghostscript version etc.), there are problems when optimizing the fonts:

$ ./pdfsizeopt.single --v=999 --use-multivalent=no --do-optimize-images=no --do-debug-gs=yes 2020_Book_.pdf 
info: This is pdfsizeopt ZIP rUNKNOWN size=69734.
info: prepending to PATH: /tmp
info: PATH: /tmp:/home/rbrito/bin:/usr/lib/ccache:/usr/local/bin:/usr/local/sbin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/games
info: getcwd: /tmp
info: verifying Ghostscript: gs-pso
info: output from Ghostscript: 'GPL Ghostscript 9.05 (2012-02-08)\nCopyright (C) 2010 Artifex Software, Inc.  All rights reserved.\nThis software comes with NO WARRANTY: see the file PUBLIC for details.\n/GSOK\n'
info: Ghostscript version info: 'GPL Ghostscript 9.05 (2012-02-08)'
info: using Ghostscript /home/rbrito/bin/gs-pso: GPL Ghostscript 9.05 (2012-02-08)
info: found working Ghostscript: gs-pso
info: loading PDF from: 2020_Book_.pdf
info: loaded PDF of 6166598 bytes
info: separated to 9239 objs + xref + trailer
info: parsed 9239 objs
info: eliminated 2 unused objs, depth=20
info: found 0 Type1 fonts loaded
info: found 55 Type1C fonts loaded
info: writing Type1CParser (192414 font bytes) to: psotmp.3359.conv.parse.tmp.ps
info: executing Type1CParser with Ghostscript: gs-pso -q -P- -dNOPAUSE -dBATCH -sDEVICE=nullpage -sDataFile=psotmp.3359.conv.parsedata.tmp.ps -f psotmp.3359.conv.parse.tmp.ps
Type1CParser: using interpreter GPL Ghostscript 905 20120208
Error: /undefined in R
Operand stack:
   --nostringval--   --nostringval--   Filter   FlateDecode   Length   4011   Metadata   15   0
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1894   1   3   %oparray_pop   1893   1   3   %oparray_pop   1877   1   3   %oparray_pop   1771   1   3   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--
Dictionary stack:
   --dict:1154/1684(ro)(G)--   --dict:0/20(G)--   --dict:99/200(L)--
Current allocation mode is local
Current file position is 16527
GPL Ghostscript 9.05: Unrecoverable error, exit code 1
fatal: Type1CParser failed, status=0x100
$ 

I'm attaching the files in question here (the PS files had to be compressed or github wouldn't allow me to attach them).

2020Book.pdf psotmp.3359.conv.parse.tmp.ps.gz psotmp.3359.conv.parsedata.tmp.ps.gz

I can work around the problem by telling pdfsizeopt to not optimize the fonts (but that, of course, is not a real fix):

$ ./pdfsizeopt.single --v=999 --use-multivalent=no --do-optimize-images=no --do-debug-gs=yes --do-optimize-fonts=no 2020_Book_.pdf 
info: This is pdfsizeopt ZIP rUNKNOWN size=69734.
info: prepending to PATH: /tmp
info: PATH: /tmp:/home/rbrito/bin:/usr/lib/ccache:/usr/local/bin:/usr/local/sbin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/games
info: getcwd: /tmp
info: verifying Ghostscript: gs-pso
info: output from Ghostscript: 'GPL Ghostscript 9.05 (2012-02-08)\nCopyright (C) 2010 Artifex Software, Inc.  All rights reserved.\nThis software comes with NO WARRANTY: see the file PUBLIC for details.\n/GSOK\n'
info: Ghostscript version info: 'GPL Ghostscript 9.05 (2012-02-08)'
info: using Ghostscript /home/rbrito/bin/gs-pso: GPL Ghostscript 9.05 (2012-02-08)
info: found working Ghostscript: gs-pso
info: loading PDF from: 2020_Book_.pdf
info: loaded PDF of 6166598 bytes
info: separated to 9239 objs + xref + trailer
info: parsed 9239 objs
info: eliminated 2 unused objs, depth=20
info: optimized 3122 streams, kept 16 #orig, 2147 uncompressed, 959 zip
info: eliminated 2911 duplicate objs
info: compressed 1145 streams, kept 0 of them uncompressed
info: saving PDF with 6326 objs to: 2020_Book_.pso.pdf
info: generated object stream of 120713 bytes in 4191 objects (9%)
info: generated 4945580 bytes (80%)
$

It would be great if you could help with this situation.

Thanks,

Rogério Brito.

pts commented 1 year ago

Thank you for reporting this! I can reproduce the issue, I get the same error message.

pts commented 1 year ago

This is a bug in pdfsizeopt. The /Metadata 15 0 R entry triggers it within 16 obj << ... >>. It should be ignored by Type1CParser.

pts commented 1 year ago

Fixed in 6d16c5bae46b58d4165fbf43e5417cee72ae5ff4. The fix changes which entries are passed to Type1CParser. Since only the stream was relevant, only /Length, /Filter and /DecodeParms are passed now; /Metadata isn't.