pts / pdfsizeopt

PDF file size optimizer
GNU General Public License v2.0
750 stars 65 forks source link

warning: cannot parse obj 37: obj header (X Y obj) expected, got '37 0 obj\n #132

Closed vdmavendodu closed 1 year ago

vdmavendodu commented 4 years ago

Hi,

Please find my source.pdf and output.pdf, they both are looks different. Let me know if any information required.

Thanks & Regards, Mahesh Yadav.

input.pdf output.pdf

zvezdochiot commented 4 years ago
$ mutool info input.pdf 
input.pdf:

PDF-1.5
Info object (6 0 R):
<</Creator(HP Exstream Version 9.5.301 32-bit \(DBCS\))/CreationDate(7/18/2019 15:45:08)/Author(Registered to: RBLBANK )/Title(Credit Cards Hajira)>>
Pages: 5

Retrieving info from pages 1-5...
Fonts (69):
...
$ mutool info output.pdf 
output.pdf:

PDF-1.5
Info object (9 0 R):
<</Creator(HP Exstream Version 9.5.301 32-bit \(DBCS\))/CreationDate(7/18/2019 15:45:08)/Author(Registered to: RBLBANK )/Title(Credit Cards Hajira)>>
Pages: 5

Retrieving info from pages 1-5...
warning: not a font dict (0 0 R)
...
$ pdftk input.pdf cat 1-end output input.tk.pdf uncompress
$ pdfsizeopt input.tk.pdf 
info: This is pdfsizeopt ZIP rUNKNOWN size=69734.
info: prepending to PATH: /usr/bin
info: loading PDF from: input.tk.pdf
info: loaded PDF of 21093895 bytes
info: separated to 344 objs + xref + trailer
info: parsed 344 objs
info: found 69 Type1 fonts loaded
info: writing Type1CConverter (1376619 font bytes) to: psotmp.9883.conv.tmp.ps
...
fatal: Type1CConverter failed, status=0x100
$ ps2pdf input.pdf input.ps.pdf 
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.

   **** This file had errors that were repaired or ignored.
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

$ mutool info input.ps.pdf 
input.ps.pdf:

PDF-1.4
Info object (2 0 R):
<</Producer(GPL Ghostscript 9.05)/CreationDate(D:20190927101403+03'00')/ModDate(D:20190927101403+03'00')>>
Pages: 5

Retrieving info from pages 1-5...
Mediaboxes (1):
    1   (5 0 R):    [ 0 0 1009 612 ]

Fonts (10):
...
vdmavendodu commented 4 years ago

Issue in pdf?

Not getting your point. Can you please expand the issue..

maadjordan commented 4 years ago

there is a font issue with the "input.pdf" and PSO handle it without noticing this but this error can be revealed by processing files with Ghostscript, Qpdf or CPDF.. PSO does not have a repair mechanism to such files but I don't know how to let PSO skip processing Type 1 fonts and complete other processes

sundarciet commented 4 years ago

@maadjordan : Is it possible to solve this with this API?

sundarciet commented 4 years ago
$ mutool info input.pdf 
input.pdf:

PDF-1.5
Info object (6 0 R):
<</Creator(HP Exstream Version 9.5.301 32-bit \(DBCS\))/CreationDate(7/18/2019 15:45:08)/Author(Registered to: RBLBANK )/Title(Credit Cards Hajira)>>
Pages: 5

Retrieving info from pages 1-5...
Fonts (69):
...
$ mutool info output.pdf 
output.pdf:

PDF-1.5
Info object (9 0 R):
<</Creator(HP Exstream Version 9.5.301 32-bit \(DBCS\))/CreationDate(7/18/2019 15:45:08)/Author(Registered to: RBLBANK )/Title(Credit Cards Hajira)>>
Pages: 5

Retrieving info from pages 1-5...
warning: not a font dict (0 0 R)
...
$ pdftk input.pdf cat 1-end output input.tk.pdf uncompress
$ pdfsizeopt input.tk.pdf 
info: This is pdfsizeopt ZIP rUNKNOWN size=69734.
info: prepending to PATH: /usr/bin
info: loading PDF from: input.tk.pdf
info: loaded PDF of 21093895 bytes
info: separated to 344 objs + xref + trailer
info: parsed 344 objs
info: found 69 Type1 fonts loaded
info: writing Type1CConverter (1376619 font bytes) to: psotmp.9883.conv.tmp.ps
...
fatal: Type1CConverter failed, status=0x100
$ ps2pdf input.pdf input.ps.pdf 
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.
   **** Warning: can't process font stream, loading font by the name.

   **** This file had errors that were repaired or ignored.
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

$ mutool info input.ps.pdf 
input.ps.pdf:

PDF-1.4
Info object (2 0 R):
<</Producer(GPL Ghostscript 9.05)/CreationDate(D:20190927101403+03'00')/ModDate(D:20190927101403+03'00')>>
Pages: 5

Retrieving info from pages 1-5...
Mediaboxes (1):
  1   (5 0 R):    [ 0 0 1009 612 ]

Fonts (10):
...

Any possible solution to fix this issue ?

zvezdochiot commented 4 years ago

@sundarciet say:

Any possible solution to fix this issue ?

Use GhostScript:

ps2pdf input.pdf input.ps.pdf

or

gs -P- -dSAFER -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=input.ps.pdf -P- -dSAFER -c .setpdfwrite -f input.pdf
pts commented 1 year ago
info: writing Type1CConverter (1376619 font bytes) to: psotmp.9883.conv.tmp.ps
...
fatal: Type1CConverter failed, status=0x100

You have replaced the most relevant part of the error message with .... Could you please copy-paste the full message?

Anyway, this looks like a Ghostscript compatibility issue. To resolve it, use an older version of Ghostscript with pdfsizeopt. The easiest way is installing pdfsizeopt by following the install instructions at https://github.com/pts/pdfsizeopt . More details here: https://github.com/pts/pdfsizeopt/issues/157

pts commented 1 year ago

Alternatively, you can use pdfsizeopt --do-optimize-fonts=no as a workaround.

pts commented 1 year ago

pdfsizeopt has other problems (unrelated to Ghostscript) with the attached input.pdf file:

warning: cannot parse obj 37: obj header (X Y obj) expected, got '37          0 obj\n<<\n/Type /Font' at ofs=32018
warning: cannot parse obj 38: obj header (X Y obj) expected, got '38          0 obj\n<<\n/Type /Font' at ofs=33055
warning: cannot parse obj 39: obj header (X Y obj) expected, got '39          0 obj\n<</Filter /Fla' at ofs=33252
warning: cannot parse obj 40: obj header (X Y obj) expected, got '40          0 obj\n<<\n/Type /Font' at ofs=60433
warning: cannot parse obj 41: obj header (X Y obj) expected, got '41          0 obj\n<<\n/Type /Font' at ofs=61363
warning: cannot parse obj 42: obj header (X Y obj) expected, got '42          0 obj\n<</Filter /Fla' at ofs=61562

I'm looking into this.

pts commented 1 year ago

The issues

warning: cannot parse obj 37: obj header (X Y obj) expected, got '37          0 obj\n<<\n/Type /Font' at ofs=32018
warning: cannot parse obj 38: obj header (X Y obj) expected, got '38          0 obj\n<<\n/Type /Font' at ofs=33055
warning: cannot parse obj 39: obj header (X Y obj) expected, got '39          0 obj\n<</Filter /Fla' at ofs=33252
warning: cannot parse obj 40: obj header (X Y obj) expected, got '40          0 obj\n<<\n/Type /Font' at ofs=60433
warning: cannot parse obj 41: obj header (X Y obj) expected, got '41          0 obj\n<<\n/Type /Font' at ofs=61363
warning: cannot parse obj 42: obj header (X Y obj) expected, got '42          0 obj\n<</Filter /Fla' at ofs=61562

have been fixed in fd7ae0e458a06dc32f93639ac60890f82755b47a.

However, pdfsizeopt still fails for the attached input.pdf:

info: executing Type1CConverter with Ghostscript: TMPDIR=. TEMP=. gs -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFSETTINGS=/printer -dColorConversionStrategy=/LeaveColorUnchanged -sOutputFile=psotmp.4847.conv.tmp.pdf -f psotmp.4847.conv.tmp.ps
Type1CConverter: using interpreter GPL Ghostscript 905 20120208
Type1CConverter: converting font /F2 to /Obj0000000038
Type1CConverter: converting font /F2_1 to /Obj0000000041
Type1CConverter: converting font /F2_2 to /Obj0000000044
Type1CConverter: converting font /F2_30 to /Obj0000000047
Type1CConverter: converting font /F2_32 to /Obj0000000050
Type1CConverter: converting font /F2_251 to /Obj0000000053
Type1CConverter: converting font /F3 to /Obj0000000056
Type1CConverter: converting font /F3_1 to /Obj0000000059
Type1CConverter: converting font /F3_2 to /Obj0000000062
Type1CConverter: converting font /F3_30 to /Obj0000000065
Type1CConverter: converting font /F3_32 to /Obj0000000068
Type1CConverter: converting font /F3_251 to /Obj0000000071
Error: /undefined in Neither

Also Evince displays the error message some font thing failed for input.pdf. So most probably fonts in the file input.pdf are broken (corrupt).

As a workaround, run pdfsizeopt --do-optimize-fonts=no.