openpreserve / jhove

File validation and characterisation.
http://jhove.openpreservation.org
Other
171 stars 79 forks source link

Break out of infinite loop in parseIFDChain() before there is a stackOverflowError() #784

Closed karenhanson closed 1 year ago

karenhanson commented 2 years ago

I came across a JPEG file that appears to have a corrupted subIFD in its first IFD - I assume it's corrupted because the number of fields coming up for it is 0 and the next offset reference points back to byte 8, which is also the first IFD offset (its parent). The result is an ever-looping parseIFDChain() that eventually crashes out with a stackOverflowError:

Exception in thread "Thread-4" java.lang.StackOverflowError
        at java.util.Hashtable.put(Unknown Source)
        at sun.java2d.Disposer.add(Unknown Source)
        at sun.java2d.Disposer.addRecord(Unknown Source)
        at sun.java2d.cmm.lcms.LCMS.loadProfileNative(Native Method)
        at sun.java2d.cmm.lcms.LCMS.loadProfile(Unknown Source)
        at java.awt.color.ICC_Profile.getInstance(Unknown Source)
        at edu.harvard.hul.ois.jhove.NisoImageMetadata.extractIccProfileDescription(NisoImageMetadata.java:2262)
        at edu.harvard.hul.ois.jhove.module.tiff.TiffIFD.lookupTag(TiffIFD.java:2637)
        at edu.harvard.hul.ois.jhove.module.tiff.IFD.parse(IFD.java:361)
        at edu.harvard.hul.ois.jhove.module.tiff.IFD.parse(IFD.java:258)
        at edu.harvard.hul.ois.jhove.module.TiffModule.parseIFDChain(TiffModule.java:1220)
        at edu.harvard.hul.ois.jhove.module.TiffModule.parseIFDChain(TiffModule.java:1246)
        at edu.harvard.hul.ois.jhove.module.TiffModule.parseIFDChain(TiffModule.java:1246)
        at edu.harvard.hul.ois.jhove.module.TiffModule.parseIFDChain(TiffModule.java:1246)
        ... and on and on

The code is not detecting that it is repeatedly visiting the same IFD, and so I copied the solution here into the parseIFDChain() method: https://github.com/openpreserve/jhove/blob/1bb234219c08a0bdce8f15dd003c3fb920e51814/jhove-modules/tiff-hul/src/main/java/edu/harvard/hul/ois/jhove/module/TiffModule.java#L1190-L1192 This way I can re-use the error message and apply the same logic. So this fixes the error by exiting after 50 loops. It might be more elegant to check for duplication in the offset or next values in the list of IFDs, but I wasn't sure if there might be unintended consequences with that route since I'm not familiar with the TIFF Exif format (beyond what I figured out today while debugging). Let me know if you see a better way to fix this and I can try to work on it some more.

codecov[bot] commented 2 years ago

Codecov Report

Patch and project coverage have no change.

Comparison is base (9ff7d97) 46.87% compared to head (dc26346) 46.87%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## integration #784 +/- ## ============================================== Coverage 46.87% 46.87% Complexity 1083 1083 ============================================== Files 57 57 Lines 9050 9050 Branches 1607 1607 ============================================== Hits 4242 4242 Misses 4274 4274 Partials 534 534 ``` Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openpreserve). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=openpreserve)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

carlwilson commented 1 year ago

I've finished with the PDF module contributions now and will be looking at this next. Are there any available test files for this @karenhanson ?

carlwilson commented 1 year ago

I've tested this, and the only "objection" I have is that the error list is repeated for each run so there are far too many versions of the same error. I have (half) a plan to combat this some but would be interested to hear your thoughts @karenhanson

karenhanson commented 1 year ago

Sorry for the delay, and thanks for taking a look at this - I hadn't noticed the duplicated errors. Just to add our offline conversation to the PR's record... I think the approach you emailed me (paraphrased: if (a) ID error identical (b) offset identical and >-1 (i.e. has a loc) (c) message identical and (d) sub-message identical; then don't print error as it is a duplicate) sounds correct - unless there are likely to be duplicates with no loc, which seems like a bigger issue. If you want me to do a manual re-test after any changes or if there is anything else I can do to help, please let me know.