jai-imageio / jai-imageio-jpeg2000

JPEG2000 support for Java Advanced Imaging Image I/O Tools API
Other
74 stars 56 forks source link

Can't read gray jpg2000 file #9

Open THausherr opened 7 years ago

THausherr commented 7 years ago

my code:

        ImageReader reader = imageReadersByFormatName.next();
        System.out.println("reader.canReadRaster(): " + reader.canReadRaster());
        ImageInputStream iis = ImageIO.createImageInputStream(new File("x.jp2"));
        reader.setInput(iis, true, true);
        BufferedImage image = reader.read(0);

The output:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 54
    at jj2000.j2k.fileformat.reader.FileFormatReader.getColorModel(FileFormatReader.java:680)
    at com.github.jaiimageio.jpeg2000.impl.J2KReadState.getColorModel(J2KReadState.java:935)
    at com.github.jaiimageio.jpeg2000.impl.J2KReadState.readBufferedImage(J2KReadState.java:343)
    at com.github.jaiimageio.jpeg2000.impl.J2KImageReader.read(J2KImageReader.java:441)
    at javax.imageio.ImageReader.read(ImageReader.java:939)

`

I'm using version 1.3.0.

x jp2 (file is a renamed JPEG20000)

According to IrfanView, the file is a JPEG2000 - Wavelet, Grayscale.

stain commented 7 years ago

Not sure if this bug is related to these warnings I get from openjpg-tools from your file:

stain@biggiebuntu:~/Pictures$ j2k_to_image -i 350c3f9c-d395-11e6-880f-bbd8fedda5c2.jp2  -o 350c3f9c-d395-11e6-880f-bbd8fedda5c2.ppm

[WARNING] SOT marker inconsistency in tile 0: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 1: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 2: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 3: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 4: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 5: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 6: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 7: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 8: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 9: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 10: tile-part index greater (5) than number of tile-parts (5)
[WARNING] SOT marker inconsistency in tile 11: tile-part index greater (5) than number of tile-parts (5)
stain commented 7 years ago

This is happening inside FileFormatReader as it is accessing the maps array - which was populated by readComponentMappingBox(12)

Debugging for your file I get that it has read the arrays:

so something looks not right there.. back in getColorModel() the lut variable only has [[-1], [-1], [-1]] and so an index of 54 and even less so -35 is not going to work well.

Any idea of what could be wrong? To the untrained eye it seems to think it is not truly grayscale, but still with 3 components, but they are not the usual RGB, but rather 0, 0 and 28784 (aka 0x7070) - if that makes any sense. So either the file header is wrong in the depth/component information , something is broken as it reads the COMPONENT_MAPPING_BOX, or something changed in the JPEG2000 spec not reflected in this aged reference implementation copied by Sun.

stain commented 7 years ago

If I convert your picture with j2k_to_image I seem to get a fully black picture.. is that what is intended?

THausherr commented 7 years ago

According to IrfanView it is fully white.

It makes sense that it is white: in the PDF that contains the problem file (file 001131 from the digitalcorpora size) a shape is created and then that image is rendered with this shape as clipping path. This appears white with Adobe Reader.

stain commented 7 years ago

j2k_dump says there is only one comp:

image {
  x0=0, y0=0, x1=627, y1=807
  numcomps=1
  comp 0 {
    dx=1, dy=1
    prec=8
    sgnd=0
  }
}

which is consistent with a grayscale picture.. so something is wrong early on in the call to readComponentMappingBox() with for some reason length 12 (divided by 4 makes 3 components) instead of length 4 - thus it reads too far ahead and gets two extra funny components.

stain commented 7 years ago

No, the length is still 12.. but the second and third component mappings are bogus in the file.

I think something is wrong with that map as it's read from the file.. if I replace it with the map [0,1,2] (as would happen if there was no map, then the image reads fine, and in fact, when output as PNG is 100% white.

stain commented 7 years ago

Checking with Python (or a hex editor) you will find your file says:


>>> j = open("350c3f9c-d395-11e6-880f-bbd8fedda5c2.jp2").read(1024)
>>> j.find("\x63\x6d\x61\x70") # COMPONENT_MAPPING_BOX
136
>>> j[135]
'\x14'
>>> length = 0x14
>>> length
20
>>> j[135:135+length]
'\x14cmap\x00\x00\x01\x00\x00\x00\xff6pp\x80\xdd\x00\x00\x00'
>>> cmap = j[135:135+length][8:]
>>> cmap
'\x00\x00\x00\xff6pp\x80\xdd\x00\x00\x00'
>>> len(cmap)
12
>>> struct.unpack(">HBB", cmap[0:4])
(0, 0, 255)
>>> struct.unpack(">HBB", cmap[4:8])
(13936, 112, 128)
>>> struct.unpack(">HBB", cmap[8:])
(56576, 0, 0)

(I unpacked according to section 1.5.3.5 [in ISO/IEC 15444-1:2002 T.800](http://www.itu.int/rec/T-REC-T.800-200208-S/en this then defines the CMP, MTYP and PCOL: )

CMP This field specifies the index of component from the codestream that is mapped to this channel (either directly or through a palette). This field is encoded as a 2-byte big endian unsigned integer.

So this means component index 13936 and 56576 for the second and third channel ... does that make sense?

MTYP This field specifies how this channel is generated from the actual components in the file. This field is encoded as a 1-byte unsigned integer.

Only value 0 and 1 are defined as types for MTYP, so here value 112 for the second channel is way out.. (or is that supported by an extension?)

PCOL This field specifies the index component from the palette that is used to map the actual component from the codestream. This field is encoded as a 1-byte unsigned integer. If the value of the MTYP field for this channel is 0, then the value of this field shall be 0.

Yet it is written as 255 for the first channel which has MTYP 0..?

I'm afraid I'm not an expert in JPEG 2000 and get a bit confused..

stain commented 7 years ago

What is confusing is the reference implementation uses signed shorts when the spec says unsigned shorts.. and same for bytes. So there could be multiple sign errors somewhere going unnoticed.

stain commented 7 years ago

Best bet is that the JP2 file you sent are written with C code for 3 channels - but as just 1 component is used the mapping for channel 2 and 3 was written with unintialized (e.g. ~random) data. This reference implementation parses according to spec, which says the number of components is defined by the size of the cmap box - in this case 12 bytes aka 3 components.

(The spec does not say what to do if there's a mismatch)

THausherr commented 2 months ago

Another one: PDFJS-11306 jp2 (file is a renamed JPEG2000)