hey-red / Mime

.NET wrapper for libmagic
MIT License
84 stars 22 forks source link

Different extension compared to the `file` command #52

Open alireza-rezaee opened 9 months ago

alireza-rezaee commented 9 months ago

It seems that an error occurs here and it recognizes the .gz as .bin. While the file has recognized it correctly. I don't know, doesn't the file have an api to get the extension directly? If I understand correctly we are actually using MIME Type mapping as an alternative.

[Fact]
public void Guess_Gzip_ReturnSameAsNative()
{
    // small gzip file: https://github.com/mathiasbynens/small
    byte[] s_gzipBytes =
    [
        0x1f, 0x8b, 0x08, 0x00, 0xae, 0x86, 0xe1, 0x5b, 0x02, 0x03, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00
    ];

    var actualMimeType = GuessMimeType(s_gzipBytes);
    var actualExtension = GuessExtension(s_gzipBytes);

    // $ file gzip.gz --mime
    // → gzip.gz: application/gzip; charset=binary
    string expectedMimeType = "application/gzip";

    // $ file gzip.gz --extension
    // → gzip.gz: gz/tgz/tpz/ipk/vbox-extpack/svgz
    string[] expectedExtensions = [ "gz", "tgz", "tpz", "ipk", "vbox-extpack", "svgz"];

    Assert.Equal(expectedMimeType, actualMimeType);
    Assert.Contains(expectedExtensions, e => e == actualExtension); // ← Exception raised here
}

Assert.Contains() Failure

Assert.Contains() Failure
Not found: (filter expression)
In value:  String[] ["gz", "tgz", "tpz", "ipk", "vbox-extpack", ...]
   at Test.UnitTest.Guess_Gzip_ReturnsSameAsNative() in .../UnitTest.cs:line 28
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

This is probably because MimeTypesMap — which depends on MIME types known by Apache: https://github.com/hey-red/Mime/blob/b0582324592c574fda00e0705e4ed036fc918f8e/src/Mime/MimeGuesser.cs#L99

hey-red commented 9 months ago

If you want to get extension directly from libmagic:

using var magic = new Magic(MagicOpenFlags.MAGIC_EXTENSION);
var result = magic.Read(@"/path/to/gzip.gz"); // from file
Console.WriteLine(result); // -> gz/tgz/tpz/ipk/vbox-extpack/svgz/blend/dia/gnucash/rdata/xoj

However for this file it doesn't work when the magic_buffer method is used:

byte[] buf = File.ReadAllBytes(@"/path/to/gzip.gz");
using var magic = new Magic(MagicOpenFlags.MAGIC_EXTENSION);
var result = magic.Read(buf, buf.Length);
Console.WriteLine(result); // -> "???"

I have no idea why we have different behaviour.. But I think you can update MimeTypesMap dict, before you get results from MimeGuesser: MimeTypesMap.AddOrUpdate("application/gzip", "gz"); or create your own dictionary with mime<->extensions mapping.

alireza-rezaee commented 9 months ago

byte[] buf = File.ReadAllBytes(@"/path/to/gzip.gz"); using var magic = new Magic(MagicOpenFlags.MAGIC_EXTENSION); var result = magic.Read(buf, buf.Length); Console.WriteLine(result); // -> "???"

This difference behavior is strange, but it is OK with this very similar gzip-name.gz file:

$ xxd gzip.gz
00000000: 1f8b 0800 ae86 e15b 0203 0300 0000 0000  .......[........
00000010: 0000 0000                                ....

$ xxd gzip-name.gz
00000000: 1f8b 0808 ae86 e15b 0203 6e00 0300 0000  .......[..n.....
00000010: 0000 0000 0000
// gzip-name.gz
bytes[] fileBytes =
[
    0x1f, 0x8b, 0x08, 0x08, 0xae, 0x86, 0xe1, 0x5b, 0x02, 0x03, 0x6e, 0x00, 0x03, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00
];
using var magic = new Magic(MagicOpenFlags.MAGIC_EXTENSION);
magic.Read(fileBytes, fileBytes.Length) // -> "gz/tgz/tpz/zabw/svgz/adz/kmy/xcfgz"