haraldk / TwelveMonkeys

TwelveMonkeys ImageIO: Additional plug-ins and extensions for Java's ImageIO
https://haraldk.github.io/TwelveMonkeys/
BSD 3-Clause "New" or "Revised" License
1.88k stars 312 forks source link

Unable to parse a WMF thumbnail file extracted from OOXML Office #35

Closed thaichat04 closed 10 years ago

thaichat04 commented 10 years ago

This simple WMF file extracted from OOXML Office thumbnail. WMFImageReaderSpi can not decode with its header. {01 00 09 00 00 03 82 09 00 00 0C 00 54 00 00 00 00 00 04 00 00 00 03 01 08 00 05 00 00 00 0B 02 00 00 00 00}

haraldk commented 10 years ago

Hi,

Thanks for reporting!

Please attach the problematic file (all of the extracted WMF file), and I'll have a look at it.

Feel free to provide a patch and/or failing test case too. ;-)

Best regards,

Harald K

thaichat04 commented 10 years ago

Thank your reply. I would love to attach files, but unable with this issue management. I uploaded onto: http://www.sendspace.com/filegroup/viiLQnCfasXEh9tJf81TnQ

haraldk commented 10 years ago

Follow up: I just realized I had two WMFImageReaderSpis... :-)

Are you using the JMagick wrapper or the Batik wrapper?

I can open the files in Word, so they seem valid. I need to do some digging I guess. Do you have other examples? Or perhaps some resources on the WMF and EMF formats, on how to properly identify them based on contents (file extension won't do).

Harald K

thaichat04 commented 10 years ago

I'm using Batik wrapper. These files are extracted from docx and xlsx saved from MS Office 2010. So I guess that's a quite common case.

haraldk commented 10 years ago

One thing you could try (as a workaround) is to dump the files to disk and read them directly using Batik, and see if that works. I did some initial testing, and it didn't look too good, but I might be missing something.

If Batik don't recognize the files as WMF or EMF, I don't think it makes much sense to make the ImageIO wrappers recognize them.

Harald K

thaichat04 commented 10 years ago

Thank your reply. Batik could not parse these WMF file, only WMF with Aldus Placable Header http://command-line-imageconverterplus.com/news/portable_and_not_portable_metafiles_968.html

So not on 12Monkeys could fix this problem. Can close this issue.

haraldk commented 10 years ago

Hi again!

Just got an idea, and it worked! :-)

Try this:

public static void main(String[] args) throws IOException {
    WMFImageReader reader = new WMFImageReader(null);
    File file = new File(args[0]);

    // Inject an Aldus Placeable Metafile header before the plain WMF file
    SequenceInputStream stream = new SequenceInputStream(
            new ByteArrayInputStream(createAldusHeader()),
            new FileInputStream(file)
    );

    reader.setInput(ImageIO.createImageInputStream(stream));
    BufferedImage image = reader.read(0);
    showIt(image, file.getName());
}

/**
 * Creates a (fake) Aldus Placeable Metafile header.
 */
private static byte[] createAldusHeader() {
    byte[] header = {
            (byte) 0xD7, (byte) 0xCD, (byte) 0xC6, (byte) 0x9A,  // magic
            0, 0, // handle
            0, 0, // left
            0, 0, // right
            100, 0, // right
            100, 0,// bottom
            72, 0,
            0, 0, 0, 0, // reserved
            0, 0 // checksum
    };

    ShortBuffer buffer = ByteBuffer.wrap(header, 0, 20).asShortBuffer();
    short checksum = 0;
    while (buffer.hasRemaining()) {
        checksum ^= buffer.get();
    }

    header[20] = (byte) (checksum & 0xff);
    header[21] = (byte) (checksum >>> 8);

    return header;
}

You may use (and improve) this code in your extraction if you like.

Best regards,

Harald K

haraldk commented 10 years ago

foo-wmf

thaichat04 commented 10 years ago

Nice idea, thank alot. BTW, header bytes are variable from a WMF format to other ? private static final byte[] header = {(byte) 0xD7, (byte) 0xCD, (byte) 0xC6, (byte) 0x9A, // magic 0, 0, // handle 0, 0, // left 0, 0, // top 100, 0, // right 100, 0,// bottom 72, 0, 0, 0, 0, 0, // reserved 0, 0 // checksum };

I mean right/botton values bounder may differ ? I passed this code on an other WMF file, that crops a small part of entire image.

haraldk commented 10 years ago

Yes,

I guess both top/left and bottom/right may differ, as well as the DPI. This was only meant as a proof of concept. :-)

Not sure how the extraction works, but if you have these values available when extracting, you could inject them like this:

(slightly updated example, for easier modification)

/**
 * Creates an Aldus Placable Metafile header.
 *
 * @return the header as a byte array.
 *
 * @see <a href="http://wvware.sourceforge.net/caolan/ora-wmf.html">Microsoft Windows Metafile</a>
 */
private static byte[] createAldusHeader(int left, int top, int right, int bottom, int unitsPerInch) {
    ByteBuffer buffer = ByteBuffer.allocate(22).order(ByteOrder.LITTLE_ENDIAN);

    buffer.putInt(0x9AC6CDD7); // magic (always 0x9AC6CDD7)
    buffer.putShort((short) 0); // handle (always 0)
    buffer.putShort((short) left); // left (in "twips" = 1/1440 of an inch)
    buffer.putShort((short) top); // top
    buffer.putShort((short) right); // right
    buffer.putShort((short) bottom); // bottom
    buffer.putShort((short) unitsPerInch); // inches
    buffer.putInt(0); // reserved (always 0)

    buffer.flip();

    // Calculate checksum
    short checksum = 0;
    while (buffer.hasRemaining()) {
        checksum ^= buffer.getShort();
    }

    buffer.limit(buffer.capacity());

    buffer.putShort(checksum); // checksum

    return buffer.array();
}

Best regards,

Harald K