syntacticsplenda / jbig2-imageio

Automatically exported from code.google.com/p/jbig2-imageio
0 stars 0 forks source link

Can't read some corrupted JBIG2 image #2

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
I'me using jbig2-imageio for reading images extracted from a PDF file by iText. 
The image is corrupted, becaus the EOF is detected before the legal end of the 
image.

Here is the code I'm using to obtain a BufferedImage:

=============================================
    private BufferedImage getBufferedImage(byte[] binary, String fileType) {
        BufferedImage image = null;
        ImageInputStream is = null;
        try {
            is = ImageIO.createImageInputStream(new ByteArrayInputStream(binary));
        } catch (IOException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
        try {
            System.out.println("Trying to load image of type "+ fileType);
            Iterator<ImageReader> readers = ImageIO.getImageReadersBySuffix(fileType);
            while (readers.hasNext() && image == null) {
                ImageReader reader = readers.next();
                reader.setInput(is);
                image = reader.read(0);
            }
        } catch (IOException e) {
            System.out.println("Method BarcodeExtractor.getBufferedImage can't load image of type "+ fileType);
            e.printStackTrace();
            image=null;
        }
        try {
            is.close();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        if (image == null) {
            String[] supportedTypes = ImageIO.getReaderFileSuffixes();
            System.out.println("Supported suffixes:");
            for (int i=0; i< supportedTypes.length; i++) {
                System.out.println("\t"+supportedTypes[i]); 
            }
        }
        return image;

    }
============================================

The value of fileType is jbig2
An EOFException is throwed by the reader. Here is the related part of my log:

=====================================================
java.io.EOFException
    at javax.imageio.stream.ImageInputStreamImpl.readBit(Unknown Source)
    at com.levigo.jbig2.SegmentHeader.readAmountOfReferredToSegments(SegmentHeader.java:222)
    at com.levigo.jbig2.SegmentHeader.parse(SegmentHeader.java:146)
    at com.levigo.jbig2.SegmentHeader.<init>(SegmentHeader.java:118)
    at com.levigo.jbig2.JBIG2Document.mapStream(JBIG2Document.java:193)
    at com.levigo.jbig2.JBIG2Document.<init>(JBIG2Document.java:115)
    at com.levigo.jbig2.JBIG2ImageReader.getDocument(JBIG2ImageReader.java:317)
    at com.levigo.jbig2.JBIG2ImageReader.createGrayScaleImage(JBIG2ImageReader.java:223)
    at com.levigo.jbig2.JBIG2ImageReader.read(JBIG2ImageReader.java:211)
    at javax.imageio.ImageReader.read(Unknown Source)
    at org.gedooo.barcode.BarcodeExtractor.getBufferedImage(BarcodeExtractor.java:164)
========================================================

It would be nice that the reader tried to return a BufferedImage. I thinks it's 
possible, because if I use "pdfimages" an utility which comes from "poppler", 
all the images are correctly extracted, while some warnings are displayed. 
Example:

pdfimages myFile.pdf img
Error (57995): Unexpected EOF in JBIG2 stream
Error (59126): Unexpected EOF in JBIG2 stream
Error (60544): Unexpected EOF in JBIG2 stream
Error (60947): Unexpected EOF in JBIG2 stream
Error (61434): Unexpected EOF in JBIG2 stream
Error (62312): Unexpected EOF in JBIG2 stream
Error (62819): Unexpected EOF in JBIG2 stream
Error (63462): Unexpected EOF in JBIG2 stream
Error (64063): Unexpected EOF in JBIG2 stream
Error (64544): Unexpected EOF in JBIG2 stream

ls
img-000.ppm  img-002.pbm  img-004.pbm  img-006.pbm  img-008.pbm  img-010.pbm
img-001.pbm  img-003.pbm  img-005.pbm  img-007.pbm  img-009.pbm

I'm using levigo-jbig2-imageio-1.2.jar

The PDF file is generated by a scanner. I've to deal this that.
Thank's in advance if you can give me some hope. I can try look at the sources 
of pdfimages to find out how is managed this EOF.

Regards,

Philippe;

Original issue reported on code.google.com by Philippe...@gmail.com on 6 Dec 2011 at 4:17

GoogleCodeExporter commented 8 years ago
Are you able to provide a corrupted file that has been generated by the 
scanner? I will try to reproduce and investigate the behaviour to find a way to 
avoid the EOFException.

Regards,
Matthäus

Original comment by matthaeu...@gmail.com on 7 Dec 2011 at 11:01

GoogleCodeExporter commented 8 years ago
Thanks for your help.

Here is a simple project with all necessary jars and a simple PDF document 
containing a corrupted jbig2 image (in the "resources" folder).

Regards,

Philippe.

Original comment by Philippe...@gmail.com on 7 Dec 2011 at 3:57

Attachments:

GoogleCodeExporter commented 8 years ago
I think that the EOFException is not the actual problem. It's seems to be a 
border effect because the end of the images is not ddtecte, or someting like 
that.

By launching a "Run as" in Eclipse, I've got an OutOfMemoryError. 

==============================================
Trying to load image of type jbig2
7 déc. 2011 16:50:23 com.levigo.jbig2.util.log.JDKLogger info
INFO: JBIG2ReadParam not specified. Default will be used.
7 déc. 2011 16:50:23 com.levigo.jbig2.util.log.JDKLogger info
INFO: Globals not set.
7 déc. 2011 16:50:23 com.levigo.jbig2.util.log.JDKLogger error
GRAVE: No global segment added so far.
7 déc. 2011 16:50:23 com.levigo.jbig2.util.log.JDKLogger error
GRAVE: No global segment added so far.
7 déc. 2011 16:50:23 com.levigo.jbig2.util.log.JDKLogger error
GRAVE: No global segment added so far.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at com.levigo.jbig2.SegmentHeader.readAmountOfReferredToSegments(SegmentHeader.java:219)
    at com.levigo.jbig2.SegmentHeader.parse(SegmentHeader.java:146)
    at com.levigo.jbig2.SegmentHeader.<init>(SegmentHeader.java:118)
    at com.levigo.jbig2.JBIG2Document.mapStream(JBIG2Document.java:193)
    at com.levigo.jbig2.JBIG2Document.<init>(JBIG2Document.java:115)
    at com.levigo.jbig2.JBIG2ImageReader.getDocument(JBIG2ImageReader.java:317)
    at com.levigo.jbig2.JBIG2ImageReader.getWidth(JBIG2ImageReader.java:111)
    at com.levigo.jbig2.JBIG2ImageReader.getDefaultReadParam(JBIG2ImageReader.java:89)
    at com.levigo.jbig2.JBIG2ImageReader.createGrayScaleImage(JBIG2ImageReader.java:220)
    at com.levigo.jbig2.JBIG2ImageReader.read(JBIG2ImageReader.java:211)
    at javax.imageio.ImageReader.read(ImageReader.java:940)
    at org.gedooo.debug.ImageExtractor.getBufferedImage(ImageExtractor.java:74)
    at org.gedooo.debug.ImageExtractor.<init>(ImageExtractor.java:49)
    at org.gedooo.debug.JBIG2Test.main(JBIG2Test.java:19)
=======================================

Regards,

Original comment by Philippe...@gmail.com on 7 Dec 2011 at 4:03

GoogleCodeExporter commented 8 years ago
The provided document runs fine in our own tests.

The problem at this point is that handling an embedded JBIG2 stream is a bit 
different from using a standalone JBIG2 file:
* In a standalone file there should be a file header which we can parse. This 
is the default behaviour for an image i/o plugin.
* In embedded JBIG2 streams this file header is missing.

As a workaround, you can instantiate the JBIG2ImageReader manually using the 
special constructor for embedded JBIG2 data.
Should look like this:
JBIG2ImageReader jbig2ImageReader = new JBIG2ImageReader(new 
JBIG2ImageReaderSpi(), true);

I will extend the reader to be able to determine if it is a standalone or 
embedded JBIG2 stream by itself. 

But there is another problem in your implementation. PDF can contain special 
JBIG2 data, that acts like a dictionary for several JBIG2 images. JBIG2 images 
that need the global segments provided by the extra JBIG2 data will fail if 
they are not present. You can see this, if there is a log message that says 
that the globals are not set. This is not generally an error but an information 
for developers that are using this JBIG2 image reader.
As a hint, I think your implementation should be adjusted to extract such 
globals and provide them to the reader.

Regards,
Matthaeus Krzikalla

Original comment by matthaeu...@gmail.com on 8 Dec 2011 at 9:44

GoogleCodeExporter commented 8 years ago
Hi,

Here are two others files which can help you:
- corrupted.jb2 is the content of the byte[] obtained at ImageExtractor:48
- a PDF file containing the same image in png, generated with OpenOffice.
The simple test I provided works fine with this last one.

Regards,

Philippe.

Original comment by Philippe...@gmail.com on 8 Dec 2011 at 10:13

Attachments:

GoogleCodeExporter commented 8 years ago
Our messages have crossed;

Actually, the PdfImageObject which provides the stream contains a dictionary.
This one seems to contains following data:
ImageMask, Type, Subtype, Width, BitsPerComponent, Length, Height, Filter

I'll try to take advantage of that.

Thanks again for your help.

Ph. 

Original comment by Philippe...@gmail.com on 8 Dec 2011 at 10:35

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
I added some lines in the `JBIG2ImageReader` class (See  issue 3  and the new 
commits). The file header is now recognized and the parsing should work 
correctly.

Your code should work now, with the precondition that the globals are set if 
neccessary.

Best regards,
MK

Original comment by matthaeu...@gmail.com on 8 Dec 2011 at 11:52

GoogleCodeExporter commented 8 years ago
Your fix has solved my problems, but I didn't find out any data which would be 
suitable for setting Globals.

The IndexOutOfBoundException must be cached in the function 
reachedEndOfStream(), in the cases of embedded images.

=================================
  private boolean reachedEndOfStream(long offset) throws IOException {
    try {
      subInputStream.seek(offset);
      subInputStream.readBits(32);
      return false;
    } catch (EOFException e) {
      return true;
    } catch (IndexOutOfBoundsException e) {
        return true;
    }
  }
================================

Thhanks again,

Regards,

Philippe

Original comment by Philippe...@gmail.com on 8 Dec 2011 at 3:10

GoogleCodeExporter commented 8 years ago
Also fixed the IOOB exception (see issue 4).

Thanks for reporting!

Regards,
MK

Original comment by matthaeu...@gmail.com on 8 Dec 2011 at 4:40

GoogleCodeExporter commented 8 years ago
Hello, 
        I need a decoder for jbig1 format.Is there any available decoder in java. 
        Please also find the attached jbig1 format file.

Original comment by vikas.v...@gmail.com on 13 Jan 2012 at 7:40

Attachments:

GoogleCodeExporter commented 8 years ago
This project is a JBIG2 decoder. The JBIG Standard is not supported.

Btw.: Please use the discussion groups if you have some questions.

Original comment by matthaeu...@gmail.com on 13 Jan 2012 at 9:11