Open GoogleCodeExporter opened 9 years ago
Thank you for the detailed bug report. Based on the files
image_optimze_error.txt and Pages1-7.psom.pdf you have uploaded I could figure
what's going wrong. I'm almost sure that I've identitied an easy-to-fix bug in
your jbig2.exe. Once you fix the bug, recompile jbig2.exe, and rerun
pdfsizeopt, it will be fine.
On Windows it's possible to open files in either ASCII or binary mode. ASCII is
the default; you can have binary by passing ...|O_BINARY to the 2nd argument of
open(), or passing a string containing "b" (e.g. "rb" instead of "r"; "wb"
instead of "w") to the 2nd argument of fopen(), or calling setmode(1, O_BINARY)
to put stdout to binary mode. If a file is opened in ASCII mode, than all
writes (e.g. write(...), putchar(...), fwrite(...), fprintf(...)) of "\n" (10)
actually write "\r\n" (13, 10) to the file.
In our case, jbig2.exe writes the JBIG2-compressed image to its stdout, e.g.
see the line
info: executing image optimizer jbig2: jbig2 -p pso.conv-3.sam2p-pr.png
>pso.conv-3.jbig2
in the image_optimze_error.txt you have uploaded. The bug is that jbig2.exe
writes to stdout in ASCII mode, but binary mode would be correct. It's easy to
fix: please add setmode(1, O_BINARY) to the beginning of the main() function of
jbig2.exe , recomplie jbig2.exe, and rerun the optimization like this:
$ pdfsizeopt.py --use-pngout=no Pages1-7.pdf
Now Pages1-7.psom.pdf should be correct, and the JBIG2 file should be a few
bytes shorter, as indicated on the console output. Old, incorrect:
info: optimized image XObject 3 file_name=pso.conv-3.jbig2 size=2109 (58%)
methods=jbig2:2109,#orig:3637,pngout:6793,sam2p_np:7011,sam2p_pr:8586,gs:11056
New, correct:
info: optimized image XObject 3 file_name=pso.conv-3.jbig2 size=2102 (58%)
methods=jbig2:2102,#orig:3637,sam2p_np:7011,sam2p_pr:8586,gs:11050
(Please note the difference between 2019 and 2012 bytes.)
If this O_BINARY change doesn't fix the problem, then please upload the entire
directory (containing the pso.* temporary files) ZIPped as an attachment to
this issue. Also include the recompiled jbig2.exe you use, and the console
output of pdfsizeopt.
To illustrate my point, I've modified a few bytes of Pages-1.7.psop.pdf : I've
removed the 7 extra \r characters (and added some padding after the obj the
make the file size the same). This effectively fixed the image of page 2. So if
you make jbig2.exe not emit the \r characters, most probably the whole PDF
would be fixed.
If you manage to fix jbig2.exe, please upload it as an attachment to this
issue, so others would also benefit.
Original comment by pts...@gmail.com
on 26 Jun 2012 at 9:10
Attachments:
That fixed it. Thanks for all your help!!!
Attached is my vs2010 compiled jbig2.exe and all the source code in case
someone else wants to compile it.
Original comment by fdnc...@gmail.com
on 27 Jun 2012 at 1:05
Attachments:
Thank you for sharing your jbig2.exe and your source tree.
jbig2.exe was one of the missing dependencies of pdfsizeopt on Windows. Today I
compiled the remaining few dependencies, so now pdfsizeopt is officially
available on Windows, and it's easier to install than ever. If you're
interested, please check out the new installation page at
http://code.google.com/p/pdfsizeopt/wiki/InstallationInstructionWindows .
It would be very useful if you could upload all the library dependencies of
jbig2enc_20120627.zip , including the URLs where you downloaded them from, and
a .cmd file which compiles all the dependencies from scratch. So we could say
to a future developer to install Visual Studio, download and extract a .zip
file, run a .cmd file, and wait for jbig2.exe to be built automatically.
Original comment by pts...@gmail.com
on 28 Jun 2012 at 2:20
Hey, glad I could help.
I followed the instructions here
http://tpgit.github.com/UnOfficialLeptDocs/leptonica/README.html#building-on-win
dows
to
compile Leptonica (http://leptonica.com/) and download the dependancies.
I think you can just download the dependacies (
http://leptonica.org/source/leptonica-1.68-win32-lib-include-dirs.zip) and
put everything in the right place to compile the jbig2 encoder. I may have
done that. I can't remember. ;)
Darren
Original comment by fdnc...@gmail.com
on 9 Jul 2012 at 6:47
This is what I get when I run your new windows version.
C:\Users\x991808\Desktop\pdfsizeopt_win32bin>pdfsizeopt.exe 000000.PDF
info: This is pdfsizeopt.py rUNKNOWN size=309327.
info: loading PDF from: 000000.PDF
info: loaded PDF of 515655 bytes
info: separated to 26 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: eliminated 2 unused objs in 2 classes
info: saving PDF with 24 objs with Multivalent to: 000000.psom.pdf
info: writing Multivalent input PDF: pso.conv.mi.tmp.pdf
info: generated object stream of 529 bytes in 21 objects (14%)
info: written 513629 bytes to Multivalent input PDF: pso.conv.mi.tmp.pdf
error: Multivalent.jar not found. Make sure it is on the $PATH, or it is
one of the files on the $CLASSPATH.
Traceback (most recent call last):
File ".\pdfsizeopt.py", line 7698, in <module>
main(sys.argv)
File ".\pdfsizeopt.py", line 7694, in main
may_obj_heads_contain_comments=may_obj_heads_contain_comments)
File ".\pdfsizeopt.py", line 7425, in Save
may_obj_heads_contain_comments=may_obj_heads_contain_comments)
File ".\pdfsizeopt.py", line 7322, in _RunMultivalent
assert 0, 'Multivalent.jar not found, see above'
AssertionError: Multivalent.jar not found, see above
Original comment by fdnc...@gmail.com
on 9 Jul 2012 at 6:56
AssertionError: Multivalent.jar not found, see above
Did you follow the installation instructions? Did you download the newest
pdfsizeopt.py (its size is 313571)? If that still doesn't fix the problem,
please copy-paste the output of
dir /s C:\Users\x991808\Desktop\pdfsizeopt_win32bin
Original comment by pts...@gmail.com
on 9 Jul 2012 at 8:14
Yes, I followed the instructions but I tried again this morning (re-doing all
the instructions) and everything is working fine now. Running a massive PDF to
test at the moment. So far so good. I just wish there was a way to speed up
pngout. That thing takes forever.
Original comment by fdnc...@gmail.com
on 10 Jul 2012 at 2:29
One last thing you should add is the msvcr100.dll since I compiled jbig2.exe
with vs2010. Here's mine.
Original comment by fdnc...@gmail.com
on 10 Jul 2012 at 2:57
Attachments:
About pngout: you can use --use-pngout=no . There is a speed vs size tradeoff
here. pngout is slow, but its output is small.
Original comment by pts...@gmail.com
on 10 Jul 2012 at 3:08
Based on the information you have provided, I managed to compile a jbig2.exe
(see it attached) suitable for use with pdfsizeopt. I compiled it using MinGW
(cross-compiling on Linux), so it doesn't need msvcr100.dll . (I also removed
the attached msvcr100.dll to avoid copyright issues in the future.)
In the near future, I'll release this new jbig2.exe so it will be used by
default with pdfsizeopt on Windows.
FYI My jbig2.exe is noticeably smaller than yours, because I removed many
unnecessary functions from the leptonica library (editing .c files by hand),
and I also removed a few command-line flags which pdfsizeopt doesn't need.
Thank you very much for your help providing patches and compilation
instructions, it helped me a lot in understanding jbig2 on Windows and
preparing my own version.
Original comment by pts...@gmail.com
on 11 Jul 2012 at 12:51
Attachments:
Excellent! Glad to hear you were able to get it compiled. It wasn't trivial
in VS2010 for me but MinGW is probably the easier choice, especially is you're
used to Linux/gcc. Sorry I wasn't able to provide the batch file you
requested. Just too much going on right now to mess with it.
You might want to try out this alternate version of JBIG2Enc
https://github.com/zdenop/jbig2enc/tree/R.Hatlapatka. It's supposed to have
better autothresholding which I interpret to mean better compression on some
images assuming the thresholding works. I haven't tried it yet.
BTW - I tried the --use-pngout=no on my 146MB PDF file. It took 20 minutes
instead of 2.5 hours and the file sizes were identical. So pngout doesn't seem
to help unless you have color images. Mine test file was all CCITTFaxDecode so
maybe if you see that (which is always bitonal) you shouldn't call pngout?
Just an idea to save time.
Original comment by fdnc...@gmail.com
on 11 Jul 2012 at 5:23
Original issue reported on code.google.com by
fdnc...@gmail.com
on 26 Jun 2012 at 1:03Attachments: