Stack trace when running pdfsizeopt

GoogleCodeExporter commented 8 years ago

Dear Peter,

Using revision 224 of pdfsizeopt with the document at 
http://www.sp4comm.org/docs/sp4comm.pdf

I am getting the following output:

----

info: This is pdfsizeopt.py rUNKNOWN size=318356.
info: using Java for Multivalent: /usr/bin/java
info: loading PDF from: sp4comm.pdf
info: loaded PDF of 4175835 bytes
info: using Ghostscript gs: GPL Ghostscript 9.05 (2012-02-08)
info: decompressing 72 bytes with Ghostscript /Filter/FlateDecode/DecodeParms 
<</Columns 5/Predictor 12>>
info: decompressing 2211 bytes with Ghostscript /Filter/FlateDecode/DecodeParms 
<</Columns 5/Predictor 12>>
info: found 1914 obj offsets and 39 obj streams in xref stream
Traceback (most recent call last):
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 7887, in <module>
    main(sys.argv)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 7849, in main
    ).Load(file_name)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 3504, in Load
    data, do_ignore_generation_numbers=self.do_ignore_generation_numbers)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 3859, in ParseUsingXref
    xref_ofs, xref_obj_num, xref_generation)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 3790, in ParseUsingXrefStream
    obj_num)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 2834, in ParseObjStm
    objstm_data = self.GetUncompressedStream()
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 2327, in GetUncompressedStream
    return PermissiveZlibDecompress(self.stream)
  File "/tmp/pdfsizeopt/trunk/pdfsizeopt.py", line 229, in PermissiveZlibDecompress
    uncompressed = zlib.decompressobj().decompress(data)
zlib.error: Error -3 while decompressing: incorrect header check

----

As the document may change/be updated, the md5sum of the copy that I have here 
is f22d1e72e38601a4b32861c653e2b24d.  I don't know if I am allowed to post the 
file here, but can gladly post it if you ask me.

I suspect that the problem with the decompression *may* be related to the fact 
that the document is (apparently) encrypted, as the output of pdfinfo is:

----

pdfinfo sp4comm.pdf 
Title:          livre.dvi
Creator:        dvips(k) 5.94b Copyright 2004 Radical Eye Software
Producer:       Acrobat Distiller 8.1.0 (Macintosh)
CreationDate:   Tue Apr 29 17:05:46 2008
ModDate:        Sun Nov 22 17:43:59 2009
Tagged:         no
Pages:          388
Encrypted:      yes (print:yes copy:no change:no addNotes:no)
Page size:      453.543 x 680.315 pts
File size:      4175835 bytes
Optimized:      yes
PDF version:    1.6

----

Is that the case?

If you need any further information, please let me know.

Thanks,
Rogério Brito.

P.S.: As a last resort, of course, it is way too easy to circumvent such things 
with, for example, a pass of ghostscript (e.g., ps2pdf), but it would be nice 
to have pdfsizeopt warn the user of this potential (say, perhaps, with a 
message suggesting the possibility above) instead of dumping the stack trace.

Original issue reported on code.google.com by rbr...@gmail.com on 26 Feb 2013 at 2:38

GoogleCodeExporter commented 8 years ago

Thank you for reporting this bug.

Indeed pdfsizeopt doesn't support encrypted PDF. There is issue 51 already open 
about that.

However, the error message printed for your input file was not informative 
enough, so I fixed that in r226. The new error message is:

info: This is pdfsizeopt.py r226 size=321397.
info: using Java for Multivalent: /usr/bin/java
info: loading PDF from: issue76.pdf
info: loaded PDF of 4175835 bytes
Traceback (most recent call last):
  File "./pdfsizeopt.py", line 8097, in <module>
    main(sys.argv)
  File "./pdfsizeopt.py", line 8059, in main
    ).Load(file_name)
  File "./pdfsizeopt.py", line 3692, in Load
    '.decrypted.pdf')))
NotImplementedError: encrypted PDF input not supported, use this command to 
decrypt first: qpdf --decrypt issue76.pdf issue76.decrypted.pdf

Original comment by pts...@gmail.com on 27 Feb 2013 at 10:15

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Great, much more informative, and separating the code in the extra class makes 
the code a bit easier to read.

Thanks.

Original comment by rbr...@gmail.com on 27 Feb 2013 at 11:15

hefronmedia / pdfsizeopt

Stack trace when running pdfsizeopt #76