zackw / pdfsizeopt

Automatically exported from code.google.com/p/pdfsizeopt
0 stars 0 forks source link

Multivalent: java.io.IOException: invalid distance too far back @ 0 #79

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Hi, Peter.

I'm getting a stack trace when running pdfsizeopt revision 244 with some files:

----

info: This is pdfsizeopt.py rUNKNOWN size=318356.
info: using Java for Multivalent: /usr/bin/java
info: loading PDF from: Galois.pdf
info: loaded PDF of 518573 bytes
warning: problem with xref table: xref table not found at 508214
warning: trying to load objs without the xref table
info: separated to 513 objs + trailer
info: found 0 Type1 fonts loaded
info: found 30 Type1C fonts loaded
info: writing Type1CParser (90093 font bytes) to: pso.conv.parse.tmp.ps
info: using Ghostscript gs: GPL Ghostscript 9.05 (2012-02-08)
info: executing Type1CParser with Ghostscript: gs -q -dNOPAUSE -dBATCH 
-sDEVICE=nullpage -sDataFile=pso.conv.parsedata.tmp.ps -f pso.conv.parse.tmp.ps
Type1CParser: using interpreter GPL Ghostscript 905 20120208
Type1CParser: all OK
info: parsed 30 Type1C fonts
info: eliminated 5 duplicate objs
info: saving PDF with 508 objs with Multivalent to: Galois.psom.pdf
info: writing Multivalent input PDF: pso.conv.mi.tmp.pdf
info: generated object stream of 8541 bytes in 364 objects (9%)
info: written 462924 bytes to Multivalent input PDF: pso.conv.mi.tmp.pdf
info: executing Multivalent to optimize PDF: /usr/bin/java -cp 
/home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/Multivalent.jar 
-Djava.awt.headless=true tool.pdf.Compress -nopagepiece -noalt -mon 
pso.conv.mi.tmp.pdf
file:/home/rbrito/Dropbox/documents-to-sort-out/pso.conv.mi.tmp.pdf, 462924 
bytes
PDF 1.5, producer=MiKTeX-xdvipdfmx (0.7.8), creator= XeTeX output 
2012.12.16:1357
511 objects / 106 pagesjava.io.IOException: invalid distance too far back @ 0 
while reading object #204: {Filter=FlateDecode, DATA=118811, Length=3781}
pso.conv.mi.tmp.pdf: java.io.IOException: invalid distance too far back @ 0
info: Multivalent generated pso.conv.mi.tmp-o.pdf of 0 bytes (0%)
Traceback (most recent call last):
  File "/home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/pdfsizeopt.py", line 7887, in <module>
    main(sys.argv)
  File "/home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/pdfsizeopt.py", line 7880, in main
    is_flate_ok=not do_decompress_flate)
  File "/home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/pdfsizeopt.py", line 7579, in Save
    multivalent_java=multivalent_java)
  File "/home/rbrito/Desktop/mirrors/pdfsizeopt/trunk/pdfsizeopt.py", line 7513, in _RunMultivalent
    'Multivalent generated empty output (see its error above)')
AssertionError: Multivalent generated empty output (see its error above)

----

I don't know if the problem here is with multivalent or if it is with 
pdfsizeopt, but I have been getting this java.IO.IOException a lot with some 
new PDF files that I am trying (yes, I am now hitting pdfsizeopt quite hard).

The offending file is attached.

Please let me know if there are other information that is needed.

Thanks.

Original issue reported on code.google.com by rbr...@gmail.com on 26 Feb 2013 at 2:44

Attachments:

GoogleCodeExporter commented 8 years ago
Thank you for reporting this bug, and thank you for attaching the sample input 
PDF. I could reproduce the problem. Indeed there is something wrong in 
Multivalent, and pdfsizeopt doesn't recover from it. I'll take a closer look 
later.

Please note that the Galois.pdf you have attached seems to be invalid: evince 
can't display page 37 properly, see the attached screen shot.

In order to isolate bugs in pdfsizeopt, could you please upload a valid sample 
PDF (possibly by regenerating it without page 37) for which it fails?

In the meantime, you can run `pdfsizeopt --use-multivalent=no' (without the 
quotes) as a workaround, but it won't fix your PDF if it was already broken.

Original comment by pts...@gmail.com on 26 Feb 2013 at 5:13

GoogleCodeExporter commented 8 years ago
The attached Galois.pdf file is corrupt: object 136, of /Length 3781, contains 
a corrupt (uncompressible) /FlateDecode stream.

The behavior and output of pdfsizeopt is not defined when it receives invalid 
input (such as Galois.pdf). All I can do for this issue is improving the error 
message pdfsizeopt prints a bit.

To get this PDF optimized, please regenerate it correctly first, or run it 
through a converter which removes invalid parts, and run pdfsizeopt only after 
that.

Original comment by pts...@gmail.com on 27 Feb 2013 at 7:46

GoogleCodeExporter commented 8 years ago
Hi, Peter.

I just got another copy of the document from the author and this new one is 
fine.

I guess that what we can take from this episode is that pdfsizeopt could print 
an error message instead of dumping a stack trace.

Thanks.

Original comment by rbr...@gmail.com on 27 Feb 2013 at 8:30

GoogleCodeExporter commented 8 years ago
pdfsizeopt indeed prints a useful error message ``Multivalent generated empty 
output (see its error above).''. It also prints a stack trace, which is even 
more useful, because it can be copy-pasted to the issue tracker. Removing the 
stack trace would make it less useful, thus worse. Making this particular error 
message (or the corresponding Multivalent error message) more useful would be 
too much work. Maybe I could add an ``Is the input PDF corrupt?'' clause here, 
but that's also too much work to do consistently, because PDFs can be corrupt 
in many ways. The easy improvement is to add the following sentence to the 
documentation: ``If your input PDF is corrupt, pdfsizeopt may succeed or it may 
fail, possibly with an error message which is difficult to understand. If you 
think your PDF is correct, then please report a bug in the pdfsizeopt issue 
tracker.''.

Do you have any specific suggestions how to better report the failure in this 
particular case?

Original comment by pts...@gmail.com on 3 Mar 2013 at 8:08