Closed orencohendev closed 1 year ago
I can confirm I'm able to reproduce this result. Here are the tests that failed validation:
<tests>
<contiguousCodestreamBox>
<foundExpectedNumberOfTiles>False</foundExpectedNumberOfTiles>
<foundExpectedNumberOfTileParts>False</foundExpectedNumberOfTileParts>
<tileParts>
<tilePart>
<sot>
<psotIsValid>False</psotIsValid>
</sot>
<foundNextTilePartOrEOC>False</foundNextTilePartOrEOC>
</tilePart>
</tileParts>
</contiguousCodestreamBox>
</tests>
This is not a Jpylyzer bug, but it simply indicates there's a problem with this JP2. In particular, the actual number of tiles in this file does not correspond with the expected number of tiles (as defined by the file's SIZ marker segment).
This is reproducing for me on other GIS-related JP2s from USGS. Could it be different behavior for files that are meant for GIS usage?
Hi Oren,
I just gave this a closer look, and what's happening is basically this.
Based on the overall geometry of the image which is defined in the SIZ marker, Jpylyzer calculates the expected number of tiles. Details are given here: https://jpylyzer.openpreservation.org/doc/latest/userManual.html#siz-marker.
In this case this yields a number of 12 expected tiles. This is reported by Jpylyzer as the numberOfTiles property (I uploaded the full Jpylyzer output for this image here).
Each tile is made up of one or more tile-parts. Each tile part starts with a start-of-tilepart marker (SOT), which defines a set of properties described here https://jpylyzer.openpreservation.org/doc/latest/userManual.html#sot-marker.
One of these properties is the tile index (reported as property isot), which defines the tile to which a tile-part belongs. Here's an example of Jpylyzer's output for one tile part:
<sot>
<lsot>10</lsot>
<isot>0</isot>
<psot>1134</psot>
<tpsot>0</tpsot>
<tnsot>255</tnsot>
</sot>
For a JP2 with 12 different tiles, you would expect values in the range of 0 to 11. But if you look at Jpylyzer's full output, you'll see only 3 values for isot: 0, 1 and 2. So the tile parts only cover 3 out of the 12 tiles that are part of this image!
Another red flag is the following error:
<foundNextTilePartOrEOC>False</foundNextTilePartOrEOC>
This error happens while Jpylyzer is iterating over the tile parts. For each new iteration in this loop, for a structurally valid JP2 only two outcomes are possible:
0xFF90
)0xFFD9
.Anything different from this indicates a strucurally malformed or damaged file.
Out of curiosity I opened your JP2 in a Hex editor. Towards the end of the file I saw this:
Basically this looks like a sequence of Start-Of-Tilepart markers (marker code highlighted in red), each followed by a Start-Of-Data marker (SOD, 0xFF93
). The SOD is supposed to be followed by the tile part's actual bit stream data, but instead there's just a new SOT, and the bit stream data are missing altogether!
As a further test I tried to decode the image with OpenJPEG's opj_decompress tool. This worked, but resulted in an endless list of this warning:
[WARNING] Empty SOT marker detected: Psot=12.
[WARNING] Empty SOT marker detected: Psot=12.
[WARNING] Empty SOT marker detected: Psot=12.
[WARNING] Empty SOT marker detected: Psot=12.
[WARNING] Empty SOT marker detected: Psot=12.
::
Which also indicates the presence of empty tile parts.
Since you mention that other USGS JP2s are also affected, my best guess is that the production workflow they're using has some serious flaws, resulting in missing data and, ultimately, a malformed overall file structure. The fact that this is a 4-channel image that is meant for GIS usage has nothing to do with this, because this doesn't affect the overall file structure.
Small addition - I found this old (2017) thread in an xnview forum, where someone reports the exact same problem:
https://newsgroup.xnview.com/viewtopic.php?t=35877
I also found this:
https://www.sciencebase.gov/catalog/item/58282427e4b01fad870f9744
The image that is linked to on that page results in the same validation errors. So this could mean a lot of affected images!
Closing this issue as this looks like a fault of the USGS images, not Jpylyzer.
Here's what I did:
The result is
False
The file is clearly a JP2 with geographical data and can be viewed on QGIS.Here's an example file to reproduce this with: https://prd-tnm.s3.amazonaws.com/StagedProducts/NAIP/ca_2016/37122/m_3712213_se_10_h_20160625_20161004.jp2