Open Gurfuzle opened 6 years ago
Here's an example file (zipped) example.bmp.zip
Great example Mike. Thanks much.
I've actually stumbled upon this myself and investigated a bit. The problem lies in MagicEntries.optimizeFirstBytes()
, where it calls MagicEntry.getStartsWithByte()
-> StringType.getStartingBytes()
->StringType$TestInfo.getStartingBytes()
. This will always return null if the string is less than 4 characters long.
Which means all file types that start with a string pattern of magic bytes that is less than 4 characters long will not end up in the optimization index and are never actually considered during subsequent matching attempts. Since the Bitmap format only starts with two fixed characters BM
as its starting string, it also falls victim to this rule.
Actually, the calling code only ever uses the first byte anyway, so requiring more than that seems unnecessary.
Appreciate the look @CrushaKRool . The code is supposed to use the first-byte stuff and then fall through to the findMatch(). See https://github.com/j256/simplemagic/blob/211cf35f7a827958e78aba0c15ec4c8dcfe0699a/src/main/java/com/j256/simplemagic/entries/MagicEntries.java#L122
Let me get this test in place and then debug it.
Ah, you are right. I overlooked that.
Debugging it further, it seems to identify the first magic bytes as Bitmap but fails to match any of the child formats, which require the byte at index 14 to be either 12, 40, 64 or 128. In my case it's 124, though (exported from GIMP).
Unfortunately, since the name of the parent MagicEntry
for bitmap is "unknown" and none of the children overwrite this with something else, it will end up as "unknown" in the ContentData
and also not set any mime types. And the method is coded to return null
as ContentInfo
in that case.
So I guess it boils down to both the Magic file not providing enough data to handle the base case without a proper child match, as well as GIMP producing a header of an unknown format. According to the documentation on Wikipedia, the byte on the 0-based index 14 is the start of the DIB header and tells the size of that header in bytes. So perhaps GIMP is producing some kind of header that is only 124 bytes in size, rather than the four other sizes of the PC bitmap formats defined in the Magic file.
When I'm exporting images from GIMP as bitmap, this is not recognizing the magic number for those. When I run the file through xxd, I am getting:
00000000: 424d 7a75 0200 0000 0000 7a04 0000 6c00 BMzu......z...l. 00000010: 0000 9001 0000 9001 0000 0100 0800 0000 ................ 00000020: 0000 0071 0200 232e 0000 232e 0000 0001 ...q..#...#..... 00000030: 0000 0001 0000 4247 5273 0000 0000 0000 ......BGRs......
Which does start with the 424d, but it fails to be recognized as a bitmap.