tjko / jpegoptim

jpegoptim - utility to optimize/compress JPEG files
http://www.iki.fi/tjko/projects.html
GNU General Public License v3.0
1.56k stars 116 forks source link

Option to NOT repair damaged files? #85

Closed catharsis71 closed 2 years ago

catharsis71 commented 2 years ago

In certain cases files with corruption are repaired automatically:

file1.jpg 640x480 24bit N JFIF  (Corrupt JPEG data: 111 extraneous bytes before marker 0xd9)  [WARNING] 16075 --> 13650 bytes (15.09%), optimized.
file2.jpg 320x240 24bit N JFIF  (Corrupt JPEG data: 55 extraneous bytes before marker 0xd9)  [WARNING] 83338 --> 78932 bytes (5.29%), optimized.

It would be very useful to have an option to skip processing files for which corruption such as this is detected

I know you can do a dry run but I'd prefer not to have to do a dry run every time I use the software just in case there are any damaged files (which there usually aren't)

Compare to optipng (a similar program but for PNG files), which by default just outputs a warning for damaged files, and only repairs them if the "-fix" option is used. Repairing probably should be the desired default for most people but could there not be a "-nofix" option for those like me who want to examine the files more closely before deciding whether to repair or not?

tjko commented 2 years ago

This sounds good idea and shouldn't be difficult to implement new --nofix option... Do you have an example (corrupted) jpeg file to provide for testing?

catharsis71 commented 2 years ago

This includes example files for what seems to be the 5 most common types of repairable warnings:

Premature end of JPEG file premature end of data segment extraneous bytes before marker bad Huffman code Invalid SOS parameters for sequential JPEG

examples.zip

tjko commented 2 years ago

Thanks for the samples.

I added new --nofix option (91054c26540fb24e6fed348f9afa01428b02ec2a). If you can please test that it works as expected.

catharsis71 commented 2 years ago

Excellent, this is great

I did discover one more scenario that I had a question about

Sometimes people hide archives inside images by for example appending a ZIP or RAR to the end of an image

with PNG images, optipng detects these ("Extraneous data found after IEND") and only processes them if -fix is used

however I've noticed that for JPEGs, jpegoptim doesn't even throw a warning:

$ jpegoptim -n -v *.jpg
hidden.jpg 570x362 24bit N IPTC JFIF  [OK] 6127023 --> 110390 bytes (98.20%), optimized.

In this case if you're paying attention you can notice something unusual happened because of the drastic file size reduction, although in some cases it may be much harder to notice like if the hidden archive is relatively small

Would it be theoretically possible to detect and throw a warning in cases like this?

This is getting out of the scope of what I opened the issue for... if you think it's feasible I could open another issue for it, but if it's not feasible or not something you want to do then that's fine, no big deal.

Thanks for all your assistance.

I've attached a zip containing the "hidden.jpg" referenced above, if you want it.

example-new.zip

tjko commented 2 years ago

jpegoptim is relying on libjpeg for decompression (and compression), looks like libjpeg does not detect (or care) about any data that may be present after end of the JPEG image data....

Looks like checking current position in input file after decompression is complete, would seem to reveal if there is extraneous data in the file.

I added quick check (9aad9235cd8f1ef4b68139d03682b9e7c794b6ec), this would seem to work as expected (except when input file is fed via stdin...)

catharsis71 commented 2 years ago

I compiled the current version and it's looking really good. This is going to be super useful.

$ jpegoptim-test --nofix -v *.jpg
hidden.jpg 570x362 24bit N IPTC JFIF  (Extraneous data found after end of JPEG image)  [WARNING]  [ERROR]
prehidden.jpg 570x362 24bit N IPTC JFIF  [OK] 110390 --> 110390 bytes (0.00%), skipped.
$ jpegoptim-test -n -v *.jpg
hidden.jpg 570x362 24bit N IPTC JFIF  (Extraneous data found after end of JPEG image)  [OK] 6127023 --> 110390 bytes (98.20%), optimized.
prehidden.jpg 570x362 24bit N IPTC JFIF  [OK] 110390 --> 110390 bytes (0.00%), skipped.

Only thing that struck me as slightly odd... if I typo an argument (like "--no-fix" instead of "--nofix") it outputs an error but then goes and ahead and processes anyway (potentially doing something undesired). I think most programs will abort if an unknown argument is encountered, in fact, I don't remember ever seeing a program that would just ignore the unknown argument and continue running. Is this as intended?

$ jpegoptim-test --no-fix *.jpg
jpegoptim-test: unrecognized option '--no-fix'
hidden.jpg 570x362 24bit N IPTC JFIF  (Extraneous data found after end of JPEG image)  [OK] 6127023 --> 110390 bytes (98.20%), optimized.
prehidden.jpg 570x362 24bit N IPTC JFIF  [OK] 110390 --> 110390 bytes (0.00%), skipped.
tjko commented 2 years ago

Thanks, I guess nobody had noticed the weird argument parsing until now.

I updated the arguments parsing, it was also not giving any error if some option was passed, but no input file argument at all...

43fae9d15805dd441fd5d4e1ca3ffdaa208588c6