rocketRobin / myrmec

This library is used to detect real file format type via file hex head (identify file format by header).
Apache License 2.0
71 stars 26 forks source link

The .gif file is misrecognized as a .mpg file #3

Closed alexinea closed 6 years ago

alexinea commented 6 years ago

As the title shows, the .gif file is misrecognized as a .mpg file.

Is this a bug?

Thx a lot :)

rocketRobin commented 6 years ago

检查一下文件头,你可以用 “Binary Viewer”这个软件,然后看下文件头,把文件头留言在这里我看一下,或者直接把文件发给我, jbl-2011@163.com或者贴在此处,我看下文件的hex头。

alexinea commented 6 years ago

The first 10 bytes are 47 49 46 38 39 61 2C 01 E0 00. and my code is

var sniffer = new Sniffer();
sniffer.Populate(FileTypes.Common);
var results = sniffer.Match(imageData.Take(20).ToArray(), true);

if (results != null && results.Any())
    return results.First();

Is'nt it sorting by rate priority, but the short length priority?


BTW, which QQ group do you often chat with?

rocketRobin commented 6 years ago

I find out the reason, in your case you got results are "mpg,mpeg,gif" because the hex head above could match "mpg mpeg", "47" and "gif", "47 49 46 38 39 61", and when chose match all, sniffer will got mpg first,because 47 of mpg short than 47 49 46 38 39 61 for gif so it more closer to the root of metadata tree than gif. So at the things begain, if you not match all, you only get "mpg,mpeg".now I chenge the code of FileTypes.cs ,remove 47 for mpg,mpeg. because in this page I saw 47 is a sync byte, it will repeat forever in file. see this page FILE SIGNATURES TABLE , I am in this qq group 376248054.

alexinea commented 6 years ago

Great, in this moment, I used my own FileTypes named ImageClassTypes looks like your FileTypes.Common without any other extension format.

Thx 👍

xingwen1987 commented 6 years ago

.NET Core Community Welcome you to join!