hp1res / io-tools

Automatically exported from code.google.com/p/io-tools
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

MS Word DOC wrong detection #29

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I have the following implementation:

GuessInputStream effectInputStream = GuessInputStream
                    .getInstance(
                            file,
                            null,
                            new DetectionLibrary[] { new StreamDetectorImpl(), new
DroidDetectorImpl() },
                            new Decoder[] { new Base64Decoder(),
                                    new Pkcs7Decoder(), new GzipDecoder(),
                                    new Bzip2Decoder() });
            effectInputStream.decode(false);
            effectInputStream
                    .setIdentificationDepth(FileUtil.MAX_NUM_FORMAT_PER_FILE);
            FormatEnum[] formats = effectInputStream.getFormats();

Passing to that method the file attached I got
"OLE2_COMPOUND_DOCUMENT_FORMAT" instead of "DOC" as expected.

I'm using wazformat-1.2.3 on Java 1.6 and Linux Ubuntu 9.04

Original issue reported on code.google.com by stefano....@gmail.com on 11 Jan 2010 at 4:50

Attachments:

GoogleCodeExporter commented 9 years ago
May be a limitation of the Droid detection library. I recently updated the
identification file of Droid. Please try a newer version of Wazformat (1.2.6 or 
head)
and tell me if you can still reproduce it.

Original comment by dvd.s...@gmail.com on 15 Jan 2010 at 11:15

GoogleCodeExporter commented 9 years ago
I agree with you, it seems a limitation of the Droid identification file: i can 
still
reproduce this issue with latest stable (1.2.6).
More infos for you:
the library correctly detects as "DOC" the files saved by OpenOffice (v 3.1) in 
each
variant (Word 6.0, Word for W95, MS-Office 97-2003-xp) but fails with files 
saved by
MS Office 2007 ("saved as..." MS-Office 97-2003) detected as
"OLE2_COMPOUND_DOCUMENT_FORMAT". Of course the .docx is detected as "ZIP".
Hope in a new revision of the Droid signature file. Thank you for your support!
Best regards,
Stefano

Original comment by stefano....@gmail.com on 18 Jan 2010 at 9:41