pts / pdfsizeopt

PDF file size optimizer
GNU General Public License v2.0
763 stars 66 forks source link

pnmtopng: fatal libpng error: Extra compressed data; adding imgdataopt #51

Closed pts closed 1 year ago

pts commented 7 years ago

The image stream in bad_image_extra_data.pdf indeed contains extra bytes after the image data. The expected behavior would be truncating those extra bytes. What happens instead is pdfsizeopt calls sam2p, which calls pnmtopng, which fails with fatal error pnmtopng: fatal libpng error: Extra compressed data, making sam2p fail, making pdfsizeopt fail.

Image viewer qiv also indicates the error Extra compressed data on the corresponding PNG, but it at least shows the image.

rbrito commented 7 years ago

Hi.

On Wed, Oct 4, 2017 at 4:36 AM, Péter Szabó notifications@github.com wrote:

The image stream in bad_image_extra_data.pdf indeed contains extra bytes after the image data. The expected behavior would be truncating those extra bytes. What happens instead is pdfsizeopt calls sam2p, which calls pnmtopng, which fails with fatal error pnmtopng: fatal libpng error: Extra compressed data, making sam2p fail, making pdfsizeopt fail.

I don't get this with my system here. I'm using an old version of sam2p, 0.49.2-3+b1 (that's been patched by Debian---I still have not had the time to package a new version to reintroduce it into the archives), and pnmtopng from netpb version 2:10.0-15.3+b2.

I can send you the compressed file that I get, if you want to check it.

Thanks,

-- Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br

pts commented 7 years ago

Yes, please attach the temporary .png files pdfsizeopt has created here, and please copy the console output of pdfsizeopt. sam2p shouldn't make a difference. pngtopnm may be different.

pts commented 7 years ago

A radical approach to fix these two bugs (https://github.com/pts/pdfsizeopt/issues/51 and https://github.com/pts/pdfsizeopt/issues/52) is replacing sam2p as a dependency of pdfsizeopt by a newly written tool named imgdataopt, which will provide a very small subset of the functionality of sam2p used by pdfsizeopt:

Non-features:

rbrito commented 7 years ago

Hi, Péter.

On Oct 04 2017, Péter Szabó wrote:

Yes, please attach the temporary .png files pdfsizeopt has created here, and please copy the output of pdfsizeopt. sam2p shouldn't make a difference. pngtopnm may be different.

I didn't know where you wanted me to break the execution. I broke it as shown in the following trace:


$ ~/Downloads/pdfsizeopt/pdfsizeopt --use-pngout=no --use-multivalent=no bad_image_extra_data.pdf 
info: This is pdfsizeopt rUNKNOWN size=378567.
info: prepending to PATH: /home/rbrito/Downloads/pdfsizeopt
info: loading PDF from: bad_image_extra_data.pdf
info: loaded PDF of 24810 bytes
info: separated to 5 objs + xref + trailer
info: parsed 5 objs
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: will optimize image XObject 4; orig width=681 height=250 colorspace=/DeviceRGB bpc=8 inv=False filter=/FlateDecode dp=1 size=24286 gs_device=png16m
info: saving PNG to psotmp.13626.img-4.parse.png
info: written 24130 bytes to PNG
info: optimizing 1 images of 24286 bytes in total
Traceback (most recent call last):
  File "/home/rbrito/Downloads/pdfsizeopt/pdfsizeopt", line 41, in <module>
    sys.exit(main.main(sys.argv, script_dir=script_dir))
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 9100, in main
    pdf.OptimizeImages(img_cmd_patterns=img_cmd_patterns)
  File "/home/rbrito/Downloads/pdfsizeopt/lib/pdfsizeopt/main.py", line 7084, in OptimizeImages
    assert False
AssertionError
$ ls -l
total 52
-rw-r--r-- 1 rbrito rbrito 24810 Oct  5 23:29 bad_image_extra_data.pdf
-rw-r--r-- 1 rbrito rbrito 24130 Oct  5 23:29 psotmp.13626.img-4.parse.png
$

The PNG that was generated is attached to this message (I hope that github
doesn't eat it on the way---if it does, I will attach it via the regular web
interface).

Hope this helps,

Rogério.

-- 
Rogério Brito : rbrito@{ime.usp.br,gmail.com} : GPG key 4096R/BCFCAAAA
http://cynic.cc/blog/ : github.com/rbrito : profiles.google.com/rbrito
DebianQA: http://qa.debian.org/developer.php?login=rbrito%40ime.usp.br
rbrito commented 7 years ago

I just commented out all the calls to os.remove and here is what I got:

$ ~/Downloads/pdfsizeopt/pdfsizeopt --use-pngout=no --use-multivalent=no bad_image_extra_data.pdf 
info: This is pdfsizeopt rUNKNOWN size=378713.
info: prepending to PATH: /home/rbrito/Downloads/pdfsizeopt
info: loading PDF from: bad_image_extra_data.pdf
info: loaded PDF of 24810 bytes
info: separated to 5 objs + xref + trailer
info: parsed 5 objs
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: will optimize image XObject 4; orig width=681 height=250 colorspace=/DeviceRGB bpc=8 inv=False filter=/FlateDecode dp=1 size=24286 gs_device=png16m
info: saving PNG to psotmp.16495.img-4.parse.png
info: written 24130 bytes to PNG
info: optimizing 1 images of 24286 bytes in total
info: executing image converter sam2p_np: sam2p -pdf:2 -c zip:1:9 -s Gray1:Indexed1:Gray2:Indexed2:Rgb1:Gray4:Indexed4:Rgb2:Gray8:Indexed8:Rgb4:Rgb8:stop -- psotmp.16495.img-4.parse.png psotmp.16495.img-4.sam2p-np.pdf
This is sam2p .
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
libpng warning: IDAT: Extra compressed data
libpng warning: IDAT: Extra compressed data
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: psotmp.16495.img-4.parse.png
sam2p: Notice: writeTTT: using template: p02
sam2p: Notice: applyProfile: applied OutputRule #11
sam2p: Notice: job: written OutputFile: psotmp.16495.img-4.sam2p-np.pdf
Success.
info: loading image from: psotmp.16495.img-4.sam2p-np.pdf
info: loading PDF from: psotmp.16495.img-4.sam2p-np.pdf
info: loaded PDF of 20463 bytes
info: separated to 5 objs + xref + trailer
info: parsed 5 objs
info: loaded PNG IDAT of 19753 bytes
info: executing image converter sam2p_pr: sam2p -c zip:15:9 -- psotmp.16495.img-4.parse.png psotmp.16495.img-4.sam2p-pr.png
This is sam2p .
Available Loaders: PS PDF JAI PNG JPEG TIFF PNM BMP GIF LBM XPM PCX TGA.
Available Appliers: XWD Meta Empty BMP PNG TIFF6 TIFF6-JAI JPEG-JAI JPEG PNM GIF89a+LZW XPM PSL1C PSL23+PDF PSL2+PDF-JAI P-TrOpBb.
libpng warning: IDAT: Extra compressed data
libpng warning: IDAT: Extra compressed data
sam2p: Notice: PNM: loaded alpha, but no transparent pixels
sam2p: Notice: job: read InputFile: psotmp.16495.img-4.parse.png
sam2p: Notice: applyProfile: applied OutputRule #14
sam2p: Notice: job: written OutputFile: psotmp.16495.img-4.sam2p-pr.png
Success.
info: loading image from: psotmp.16495.img-4.sam2p-pr.png
info: loaded PNG IDAT of 23714 bytes
info: optimized image XObject 4 file_name=psotmp.16495.img-4.sam2p-np.pdf size=19916 (82%) methods=sam2p_np:19916,sam2p_pr:23927,#orig:24286,parse:24286
info: saved 4370 bytes (18%) on optimizable images
info: optimized 1 streams, kept 1 #orig
info: compressed 1 streams, kept 0 of them uncompressed
info: saving PDF with 5 objs to: bad_image_extra_data.pso.pdf
info: generated object stream of 161 bytes in 3 objects (33%)
info: generated 20375 bytes (82%)
rbrito@zatz:/tmp/test$

I will attach the contents of the directory as a tarball.

rbrito commented 7 years ago

test.tar.gz

Here they go.

rbrito commented 7 years ago

Is sam2p calling pnmtopng here? I will test manually to see if pnmtopng dies or not...

rbrito commented 7 years ago

The file with parse in the name is, according to optipng, indeed broken: it tells us to process it with the -fix option. advpng doesn't even care if there is any problem and goes ahead...

pts commented 6 years ago

Yes, sam2p calls png22pnm, and if that fails, sam2p calls pngtopnm. The pnmtopng string in the error message is a bug in these tools, it should say png22pnm or pngtopnm, respectively. pnmtopng is not called by sam2p or pdfsizeopt.

Thank you for the uploads!

pts commented 1 year ago

Indeed, using imgdataopt instead of sam2p fixes the problem, because imgdataopt ignores extra data after the image data. (It also ignores the Adler-32 checksum.)

The change has been rolled out for Linux, Win32 and macOS program binaries (i.e. sam2p was change to imgdataopt, without renaming it), and to instructions in README.md for compiling from source. Thus this issue is fixed.