levigo / jbig2-imageio

A Java ImageIO plugin for the JBIG2 bi-level image format
Apache License 2.0
31 stars 19 forks source link

RuntimeException: Can't instantiate segment class #21

Closed THausherr closed 7 years ago

THausherr commented 7 years ago
Exception in thread "main" java.lang.RuntimeException: Can't instantiate segment class
    at com.levigo.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:405)
    at com.levigo.jbig2.JBIG2Page.createNormalPage(JBIG2Page.java:182)
    at com.levigo.jbig2.JBIG2Page.createPage(JBIG2Page.java:154)
    at com.levigo.jbig2.JBIG2Page.composePageBitmap(JBIG2Page.java:145)
    at com.levigo.jbig2.JBIG2Page.getBitmap(JBIG2Page.java:125)
    at com.levigo.jbig2.JBIG2ImageReader.read(JBIG2ImageReader.java:223)
    at javaapplicationjbig2test.JavaApplicationJBig2Test.test2(JavaApplicationJBig2Test.java:84)
    at javaapplicationjbig2test.JavaApplicationJBig2Test.main(JavaApplicationJBig2Test.java:52)
Caused by: java.lang.ClassCastException: com.levigo.jbig2.decoder.huffman.ValueNode cannot be cast to com.levigo.jbig2.decoder.huffman.InternalNode
    at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
    at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
    at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
    at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
    at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
    at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
    at com.levigo.jbig2.decoder.huffman.HuffmanTable.initTree(HuffmanTable.java:68)
    at com.levigo.jbig2.decoder.huffman.FixedSizeTable.<init>(FixedSizeTable.java:30)
    at com.levigo.jbig2.segments.TextRegion.symbolIDCodeLengths(TextRegion.java:892)
    at com.levigo.jbig2.segments.TextRegion.computeSymbolCodeLength(TextRegion.java:255)
    at com.levigo.jbig2.segments.TextRegion.parseHeader(TextRegion.java:153)
    at com.levigo.jbig2.segments.TextRegion.init(TextRegion.java:901)
    at com.levigo.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:400)
    ... 7 more

jbig2bug.zip

my code:

JBIG2ImageReader reader = (JBIG2ImageReader) ImageIO.getImageReadersByFormatName("JBIG2").next();
JBIG2Globals globals = reader.processGlobals(ImageIO.createImageInputStream(new File(dir,"globals.bin")));
reader.setGlobals(globals);
reader.setInput(ImageIO.createImageInputStream(new File(dir,"img.jbig2")));
BufferedImage image = reader.read(0, reader.getDefaultReadParam());

I'm using your library as part of the Apache PDFBox project. The two data segments come from the PDF, which displays in Adobe Reader, so I'd assume that the image is valid.

ghost commented 7 years ago

Thank you for filing this issue. I schedule a look.

THausherr commented 7 years ago

Here's also the PDF file for future regression tests. 584334-JBig2-p1.pdf

ghost commented 7 years ago

Thanks for providing the test resource!

janpe2 commented 7 years ago

The issue is related to text regions that use Huffman coding. I found two problems in class com.levigo.jbig2.segments.TextRegion. The first problem is in the creation of the symbol ID table. Replace the method symbolIDCodeLengths() with the following:

  private void symbolIDCodeLengths() throws IOException {
    /* 1) - 2) */
    final List<Code> runCodeTable = new ArrayList<Code>();

    for (int i = 0; i < 35; i++) {
      final int prefLen = (int) (subInputStream.readBits(4) & 0xf);
      if (prefLen > 0) {
        runCodeTable.add(new Code(prefLen, 0, i, false));
      }
    }

    if (JBIG2ImageReader.DEBUG)
      log.debug(HuffmanTable.codeTableToString(runCodeTable));

    HuffmanTable ht = new FixedSizeTable(runCodeTable);

    /* 3) - 5) */
    long previousCodeLength = 0;

    int counter = 0;
    final List<Code> sbSymCodes = new ArrayList<Code>();
    while (counter < amountOfSymbols) {
      final long code = ht.decode(subInputStream);
      if (code < 32) {
        if (code > 0) {
          sbSymCodes.add(new Code((int) code, 0, counter, false));
        }

        previousCodeLength = code;
        counter++;
      } else {

        long runLength = 0;
        long currCodeLength = 0;
        if (code == 32) {
          runLength = 3 + subInputStream.readBits(2);
          if (counter > 0) {
            currCodeLength = previousCodeLength;
          }
        } else if (code == 33) {
          runLength = 3 + subInputStream.readBits(3);
        } else if (code == 34) {
          runLength = 11 + subInputStream.readBits(7);
        }

        for (int j = 0; j < runLength; j++) {
          if (currCodeLength > 0) {
            sbSymCodes.add(new Code((int) currCodeLength, 0, counter, false));
          }
          counter++;
        }
      }
    }

    /* 6) - Skip over remaining bits in the last Byte read */
    subInputStream.skipBits();

    /* 7) */
    symbolCodeTable = new FixedSizeTable(sbSymCodes);

  }

As the standard says (ITU T.88, page 60): When code is 33 or 34, the for loop should repeat the value 0, not previousCodeLength. Only if code is 32, then previousCodeLength is repeated.

Another problem is in the method getUserTable() in the same class TextRegion. Its search algorithm does not work. A method that implements the search properly can be found in getUserTable() in class SymbolDictionary. Copy that method to TextRegion, replacing the old one. Here is the copied method:

  private HuffmanTable getUserTable(final int tablePosition) throws InvalidHeaderValueException, IOException {
    int tableCounter = 0;

    for (final SegmentHeader referredToSegmentHeader : segmentHeader.getRtSegments()) {
      if (referredToSegmentHeader.getSegmentType() == 53) {
        if (tableCounter == tablePosition) {
          final Table t = (Table) referredToSegmentHeader.getSegmentData();
          return new EncodedTable(t);
        } else {
          tableCounter++;
        }
      }
    }
    return null;
  }

This latter fix for getUserTable() might not be necessary in this current issue 21. But it is needed to fix an earlier open issue 16 at Google Code. That old issue contains attached PDF and JBIG2 files that use a user table in a text region.

ghost commented 7 years ago

Thank you. Awesome! I'll compose a pull-request with your suggested fix.

ghost commented 7 years ago

See #22

janpe2 commented 7 years ago

@THausherr I noticed that the decoded image produced by your JBIG2 file is slightly incomplete. Some text is missing at the bottom of the page: "Hard Copy Not Controlled..." (in a rectangular frame) and "Contract No ...". If I open your PDF in Acrobat Reader, those texts are shown. My pull request #29 should fix this problem and make the texts visible.