Missing images in output file

GoogleCodeExporter commented 9 years ago

What command do you run to optimize the PDF?

./pdfsizeopt.py --use-pngout=false --use-jbig2=false --use-multivalent=false 
test.pdf

What does pdfsizeopt display when running the command above?

What's expected, no errors

(If you see rUNKNOWN in the first few output lines, then you can help us by
downloading pdfsizeopt.py using Subversion (svn checkout). If it's easy for
you, then please do so, and run that pdfsizeopt.py again. Now you should
see the revision number instead of rUNKNOWN)

This is latest from source. (r166)

What's wrong with the optimized PDF?

Some images, are missing - it doesn't seem to follow a pattern between pdfs, 
but it's always the same images on the pdf.

What should be there in the optimized PDF instead?

The images...

This may be the same issue that was discussed previously: 
http://code.google.com/p/pdfsizeopt/issues/detail?id=37

I can provide a sample pdf via email - I don't want to post it here publicly 
for different reasons. If I run the command with optimize-images=false, the pdf 
is fine.

Original issue reported on code.google.com by beard.da...@gmail.com on 29 Jul 2011 at 1:43

GoogleCodeExporter commented 9 years ago

That command should be: ./pdfsizeopt.py --use-pngout=true --use-jbig2=true 
--use-multivalent=true test.pdf

Original comment by beard.da...@gmail.com on 29 Jul 2011 at 1:52

GoogleCodeExporter commented 9 years ago

Please attach a test.pdf (as small as possible) and the output .pdf . Please 
also copy-paste what pdfsizeopt displays.

Original comment by pts...@gmail.com on 29 Jul 2011 at 3:00

GoogleCodeExporter commented 9 years ago

Sorry about taking awhile to get back to this.... I'm also including both the 
test pdf and output pdf for comparison. You can see that the middle image is 
missing in the output pdf. Here's the output after running the above command:

(I know it says rUNKNOWN, but this is the latest from trunk as of today 
(9/10/2011)

info: This is pdfsizeopt.py rUNKNOWN size=279540.
info: loading PDF from: test.pdf
info: loaded PDF of 115130 bytes
info: separated to 99 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 3 Type1C fonts loaded
info: eliminated 1 duplicate /Type1C font data
info: writing Type1CParser (11775 font bytes) to: pso.conv.parse.tmp.ps
info: executing Type1CParser with Ghostscript: gs -q -dNOPAUSE -dBATCH 
-sDEVICE=nullpage -sDataFile=pso.conv.parsedata.tmp.ps -f pso.conv.parse.tmp.ps
Type1CParser: using interpreter GPL Ghostscript 901 20110207
Type1CParser: all OK
info: parsed 2 Type1C fonts
info: could not merge descs from different /Flags values: target=4 source=32 to 
/BYZGEZ+GalliardStd-Roman: /SFWXHJ+GalliardStd-Roman
info: will optimize image XObject 8; orig width=1 height=1 
colorspace=/DeviceGray bpc=8 filter=None dp=0 size=187 gs_device=pnggray
info: saving PNG to pso.conv-8.parse.png
info: written 67 bytes to PNG
info: using identical image obj 8 for obj 11
info: will optimize image XObject 22; orig width=226 height=377 
colorspace=/DeviceGray bpc=8 filter=/FlateDecode dp=0 size=583 gs_device=pnggray
info: saving PNG to pso.conv-22.parse.png
info: written 488 bytes to PNG
info: optimizing 2 images of 770 bytes in total
info: executing image optimizer sam2p_np: sam2p -pdf:2 -c zip:1:9 -s 
Gray1:Indexed1:Gray2:Indexed2:Rgb1:Gray4:Indexed4:Rgb2:Gray8:Indexed8:Rgb4:Rgb8:
stop -- pso.conv-8.parse.png pso.conv-8.sam2p-np.pdf
This is sam2p 0.49.
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM 
GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: pso.conv-8.parse.png
sam2p: Warning: SampleFormat: Opaque would be better than Gray1
sam2p: Notice: writeTTT: using template: p02
sam2p: Notice: applyProfile: applied OutputRule #0
sam2p: Notice: job: written OutputFile: pso.conv-8.sam2p-np.pdf
Success.
info: loading image from: pso.conv-8.sam2p-np.pdf
info: loading PDF from: pso.conv-8.sam2p-np.pdf
info: loaded PDF of 698 bytes
info: separated to 5 objs + xref + trailer
info: loaded PNG IDAT of 9 bytes
info: executing image optimizer sam2p_pr: sam2p -c zip:15:9 -- 
pso.conv-8.parse.png pso.conv-8.sam2p-pr.png
This is sam2p 0.49.
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM 
GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: pso.conv-8.parse.png
sam2p: Warning: SampleFormat: Opaque would be better than Gray1
sam2p: Notice: applyProfile: applied OutputRule #2
sam2p: Notice: job: written OutputFile: pso.conv-8.sam2p-pr.png
Success.
info: loading image from: pso.conv-8.sam2p-pr.png
info: loaded PNG IDAT of 10 bytes
info: executing image optimizer jbig2: jbig2 -p pso.conv-8.sam2p-pr.png 
>pso.conv-8.jbig2
info: executing image optimizer pngout: pngout pso.conv-8.parse.png 
pso.conv-8.pngout.png
 In:      67 bytes               pso.conv-8.parse.png /c0 /f0 /d8
Out:      90 bytes               pso.conv-8.pngout.png /c3 /f0 /d1, 1 colors
Unable to compress further: copying original file
info: keeping original image XObject 8, replacements too large: 
#orig:187,parse:202,sam2p_np:202,sam2p_pr:262,jbig2:264
info: executing image optimizer sam2p_np: sam2p -pdf:2 -c zip:1:9 -s 
Gray1:Indexed1:Gray2:Indexed2:Rgb1:Gray4:Indexed4:Rgb2:Gray8:Indexed8:Rgb4:Rgb8:
stop -- pso.conv-22.parse.png pso.conv-22.sam2p-np.pdf
This is sam2p 0.49.
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM 
GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: pso.conv-22.parse.png
sam2p: Warning: SampleFormat: Opaque would be better than Indexed1
sam2p: Notice: writeTTT: using template: p02ind1
sam2p: Notice: applyProfile: applied OutputRule #1
sam2p: Notice: job: written OutputFile: pso.conv-22.sam2p-np.pdf
Success.
info: loading image from: pso.conv-22.sam2p-np.pdf
info: loading PDF from: pso.conv-22.sam2p-np.pdf
info: loaded PDF of 798 bytes
info: separated to 5 objs + xref + trailer
info: loaded PNG IDAT of 33 bytes and PLTE of 6 bytes
info: executing image optimizer sam2p_pr: sam2p -c zip:15:9 -- 
pso.conv-22.parse.png pso.conv-22.sam2p-pr.png
This is sam2p 0.49.
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM 
GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: pso.conv-22.parse.png
sam2p: Warning: SampleFormat: Opaque would be better than Indexed1
sam2p: Notice: applyProfile: applied OutputRule #3
sam2p: Notice: job: written OutputFile: pso.conv-22.sam2p-pr.png
Success.
info: loading image from: pso.conv-22.sam2p-pr.png
info: loaded PNG IDAT of 33 bytes and PLTE of 3 bytes
info: saving PNG to pso.conv-22.gray.png
info: written 90 bytes to PNG
info: executing image optimizer jbig2: jbig2 -p pso.conv-22.gray.png 
>pso.conv-22.jbig2
info: executing image optimizer pngout: pngout pso.conv-22.parse.png 
pso.conv-22.pngout.png
 In:     488 bytes               pso.conv-22.parse.png /c0 /f0 /d8
Out:     105 bytes               pso.conv-22.pngout.png /c3 /f0 /d1, 1 colors
Chg:    -383 bytes ( 21% of original)
info: loading image from: pso.conv-22.pngout.png
info: loaded PNG IDAT of 33 bytes and PLTE of 3 bytes
info: optimized image XObject 22 file_name=pso.conv-22.sam2p-np.pdf size=226 
(39%) methods=sam2p_np:226,jbig2:272,pngout:283,sam2p_pr:283,#orig:583,parse:583
info: saved 357 bytes (46%) on optimizable images
info: eliminated 15 duplicate objs
info: writing Multivalent input PDF: pso.conv.mi.tmp.pdf
info: saving PDF with 83 objs to: pso.conv.mi.tmp.pdf
info: generated 99673 bytes (87%)
info: executing Multivalent to optimize PDF: java -cp 
/Users/danny/Desktop/pdfsizeopt/Multivalent.jar tool.pdf.Compress -nopagepiece 
-noalt pso.conv.mi.tmp.pdf
file:/Users/danny/Desktop/pdfsizeopt/pso.conv.mi.tmp.pdf, 99673 bytes
PDF 1.5, producer=PDFpen, creator=PDFpen
additional compression may be possible with:
     -compact -jpeg
=> new length = 91708, saved 7%, elapsed time = 0 sec
info: Multivalent generated pso.conv.mi.tmp-o.pdf of 91730 bytes (92%)
info: compressed xref stream from 395 to 395 bytes (100%)
info: optimized to 91544 bytes after Multivalent (100%)
info: saving PDF to: test.psom.pdf
info: generated 91544 bytes (80%)

Original comment by beard.da...@gmail.com on 10 Sep 2011 at 9:19

Attachments:

GoogleCodeExporter commented 9 years ago

Thank you for submitting the detailed bug report. I can confirm that the top 
middle image in the attached test.pdf gets lost when converted with 
pdfsizeopt.py r168, either way:

  ./pdfsizeopt.py --use-pngout=false --use-jbig2=false --use-multivalent=false test.pdf
  ./pdfsizeopt.py --use-pngout=false --use-jbig2=false --use-multivalent=false test.pdf

This is most probably a bug in pdfsizeopt.py or in sam2p. I'll investigate.

Original comment by pts...@gmail.com on 12 Sep 2011 at 12:27

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

This also results in a PDF with a missing image:

  ./pdfsizeopt.py --use-pngout=true --use-jbig2=true --use-multivalent=true test.pdf

Original comment by pts...@gmail.com on 12 Sep 2011 at 12:28

GoogleCodeExporter commented 9 years ago

Thanks - I can confirm that using the --do-optimize-images flag will work to 
not lose the images (but losing out on the image optimization benefits I 
suppose):

./pdfsizeopt.py --do-optimize-images=false test.pdf

Original comment by beard.da...@gmail.com on 12 Sep 2011 at 12:48

GoogleCodeExporter commented 9 years ago

Fixed in r170 based on the information you have provided. Thanks!

Original comment by pts...@gmail.com on 12 Sep 2011 at 1:56

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Thanks! - I can verify on my end that all of the pdfs that were giving me 
issues previously work correctly now. Great tool!

Original comment by beard.da...@gmail.com on 12 Sep 2011 at 3:10

hefronmedia / pdfsizeopt

Missing images in output file #47