Open THausherr opened 7 years ago
Not sure if this bug is related to these warnings I get from openjpg-tools from your file:
stain@biggiebuntu:~/Pictures$ j2k_to_image -i 350c3f9c-d395-11e6-880f-bbd8fedda5c2.jp2 -o 350c3f9c-d395-11e6-880f-bbd8fedda5c2.ppm
[WARNING] SOT marker inconsistency in tile 0: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 1: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 2: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 3: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 4: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 5: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 6: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 7: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 8: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 9: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 10: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 11: tile-part index greater (5) than number of tile-parts (5)
This is happening inside FileFormatReader as it is accessing the maps
array - which was populated by readComponentMappingBox(12)
Debugging for your file I get that it has read the arrays:
[0, 0, 28784]
[1, -1, -128]
[0, 54, -35]
so something looks not right there.. back in getColorModel()
the lut
variable only has [[-1], [-1], [-1]]
and so an index of 54
and even less so -35
is not going to work well.
Any idea of what could be wrong? To the untrained eye it seems to think it is not truly grayscale, but still with 3 components, but they are not the usual RGB, but rather 0
, 0
and 28784
(aka 0x7070
) - if that makes any sense. So either the file header is wrong in the depth/component information , something is broken as it reads the COMPONENT_MAPPING_BOX
, or something changed in the JPEG2000 spec not reflected in this aged reference implementation copied by Sun.
If I convert your picture with j2k_to_image
I seem to get a fully black picture.. is that what is intended?
According to IrfanView it is fully white.
It makes sense that it is white: in the PDF that contains the problem file (file 001131 from the digitalcorpora size) a shape is created and then that image is rendered with this shape as clipping path. This appears white with Adobe Reader.
j2k_dump
says there is only one comp
:
image {
x0=0, y0=0, x1=627, y1=807
numcomps=1
comp 0 {
dx=1, dy=1
prec=8
sgnd=0
}
}
which is consistent with a grayscale picture.. so something is wrong early on in the call to readComponentMappingBox() with for some reason length
12 (divided by 4 makes 3 components) instead of length 4 - thus it reads too far ahead and gets two extra funny components.
No, the length is still 12.. but the second and third component mappings are bogus in the file.
I think something is wrong with that map as it's read from the file.. if I replace it with the map [0,1,2]
(as would happen if there was no map, then the image reads fine, and in fact, when output as PNG is 100% white.
Checking with Python (or a hex editor) you will find your file says:
>>> j = open("350c3f9c-d395-11e6-880f-bbd8fedda5c2.jp2").read(1024)
>>> j.find("\x63\x6d\x61\x70") # COMPONENT_MAPPING_BOX
136
>>> j[135]
'\x14'
>>> length = 0x14
>>> length
20
>>> j[135:135+length]
'\x14cmap\x00\x00\x01\x00\x00\x00\xff6pp\x80\xdd\x00\x00\x00'
>>> cmap = j[135:135+length][8:]
>>> cmap
'\x00\x00\x00\xff6pp\x80\xdd\x00\x00\x00'
>>> len(cmap)
12
>>> struct.unpack(">HBB", cmap[0:4])
(0, 0, 255)
>>> struct.unpack(">HBB", cmap[4:8])
(13936, 112, 128)
>>> struct.unpack(">HBB", cmap[8:])
(56576, 0, 0)
(I unpacked according to section 1.5.3.5 [in ISO/IEC 15444-1:2002 T.800](http://www.itu.int/rec/T-REC-T.800-200208-S/en this then defines the CMP
, MTYP
and PCOL
: )
CMP This field specifies the index of component from the codestream that is mapped to this channel (either directly or through a palette). This field is encoded as a 2-byte big endian unsigned integer.
So this means component index 13936
and 56576
for the second and third channel ... does that make sense?
MTYP This field specifies how this channel is generated from the actual components in the file. This field is encoded as a 1-byte unsigned integer.
Only value 0 and 1 are defined as types for MTYP
, so here value 112
for the second channel is way out.. (or is that supported by an extension?)
PCOL This field specifies the index component from the palette that is used to map the actual component from the codestream. This field is encoded as a 1-byte unsigned integer. If the value of the MTYP field for this channel is 0, then the value of this field shall be 0.
Yet it is written as 255
for the first channel which has MTYP 0
..?
I'm afraid I'm not an expert in JPEG 2000 and get a bit confused..
What is confusing is the reference implementation uses signed shorts when the spec says unsigned shorts.. and same for bytes. So there could be multiple sign errors somewhere going unnoticed.
Best bet is that the JP2 file you sent are written with C code for 3 channels - but as just 1 component is used the mapping for channel 2 and 3 was written with unintialized (e.g. ~random) data. This reference implementation parses according to spec, which says the number of components is defined by the size of the cmap
box - in this case 12 bytes aka 3 components.
(The spec does not say what to do if there's a mismatch)
Another one: (file is a renamed JPEG2000)
my code:
The output:
`
I'm using version 1.3.0.
(file is a renamed JPEG20000)
According to IrfanView, the file is a JPEG2000 - Wavelet, Grayscale.