j256 / simplemagic

Simple file magic number and content-type library which provides mime-type determination from files and byte arrays
http://256stuff.com/sources/simplemagic/
ISC License
220 stars 45 forks source link

Doesn't recognize bitmap files exported from GIMP #58

Open Gurfuzle opened 6 years ago

Gurfuzle commented 6 years ago

When I'm exporting images from GIMP as bitmap, this is not recognizing the magic number for those. When I run the file through xxd, I am getting:

00000000: 424d 7a75 0200 0000 0000 7a04 0000 6c00 BMzu......z...l. 00000010: 0000 9001 0000 9001 0000 0100 0800 0000 ................ 00000020: 0000 0071 0200 232e 0000 232e 0000 0001 ...q..#...#..... 00000030: 0000 0001 0000 4247 5273 0000 0000 0000 ......BGRs......

Which does start with the 424d, but it fails to be recognized as a bitmap.

Gurfuzle commented 6 years ago

Here's an example file (zipped) example.bmp.zip

j256 commented 6 years ago

Great example Mike. Thanks much.

CrushaKRool commented 4 years ago

I've actually stumbled upon this myself and investigated a bit. The problem lies in MagicEntries.optimizeFirstBytes(), where it calls MagicEntry.getStartsWithByte() -> StringType.getStartingBytes() ->StringType$TestInfo.getStartingBytes(). This will always return null if the string is less than 4 characters long. Which means all file types that start with a string pattern of magic bytes that is less than 4 characters long will not end up in the optimization index and are never actually considered during subsequent matching attempts. Since the Bitmap format only starts with two fixed characters BM as its starting string, it also falls victim to this rule. Actually, the calling code only ever uses the first byte anyway, so requiring more than that seems unnecessary.

j256 commented 4 years ago

Appreciate the look @CrushaKRool . The code is supposed to use the first-byte stuff and then fall through to the findMatch(). See https://github.com/j256/simplemagic/blob/211cf35f7a827958e78aba0c15ec4c8dcfe0699a/src/main/java/com/j256/simplemagic/entries/MagicEntries.java#L122

Let me get this test in place and then debug it.

CrushaKRool commented 4 years ago

Ah, you are right. I overlooked that.

Debugging it further, it seems to identify the first magic bytes as Bitmap but fails to match any of the child formats, which require the byte at index 14 to be either 12, 40, 64 or 128. In my case it's 124, though (exported from GIMP). Unfortunately, since the name of the parent MagicEntry for bitmap is "unknown" and none of the children overwrite this with something else, it will end up as "unknown" in the ContentData and also not set any mime types. And the method is coded to return null as ContentInfo in that case.

https://github.com/j256/simplemagic/blob/074a1fd5b13dc614ba9bffa7702232fdd6130231/src/main/java/com/j256/simplemagic/entries/MagicEntry.java#L64-L67

So I guess it boils down to both the Magic file not providing enough data to handle the base case without a proper child match, as well as GIMP producing a header of an unknown format. According to the documentation on Wikipedia, the byte on the 0-based index 14 is the start of the DIB header and tells the size of that header in bytes. So perhaps GIMP is producing some kind of header that is only 124 bytes in size, rather than the four other sizes of the PC bitmap formats defined in the Magic file.