neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).
MIT License
250 stars 41 forks source link

How to add support for audio files such as mp3/mp4 #43

Open apchenjun opened 2 years ago

apchenjun commented 2 years ago

How to add support for audio files such as mp3/mp4

neilharvey commented 2 years ago

Hey, what you need to do is create a new format which inherits from the FileFormat class and has the header/extension for mp3/mp4 files.

mp3 has a signature of 49 44 43 at offset zero and a mime type of audio/mpeg to an implementation might look like this:

public class Mpeg3 : FileFormat
{
    public Mpeg3 : base(new byte[] { 0x49, 0x44, 0x43 }, "audio/mpeg", "mp3") { }
}

For mp4 it looks as though there are a few different possible headers which represent slightly different formats. 66 74 79 70 69 73 6F 6D at an offset of 4 bytes (ISO Base Media file (MPEG-4) looks to be the most common, an implementation could look like this:

public class Mpeg4 : FileFormat
{
    public Mpeg4 : base(new byte[] { 0x66, 0x74, 0x79, 0x70, 0x69, 0x73, 0x6F, 0x6D }, 'audio/mp4', "mp4", 4) {}
}

If you needed to support more variations or different audio formats, then you can just add more classes as needed. Then, to load the formats you can use the FileFormatLocator class which will scan your assembly for FileFormats:

// Find the assembly with the new types to load.
var assembly = typeof(Mpeg3).GetTypeInfo().Assembly;

// Just the formats defined in the assembly containing your custom format
// Use this if you don't care about the built-in formats
var formats = FileFormatLocator.GetFormats(assembly);

// Formats defined in the assembly and all the defaults
// Use this if you want both your custom formats and the built-in ones
var formats = FileFormatLocator.GetFormats(assembly, true);

Finally, you pass your list of formats to the constructor of FileFormatInspector and use that class to scan for files.

// Load our class with the formats we're interested in
var inspector = new FileFormatInspector(formats);

Hope that helps!

apchenjun commented 2 years ago

@neilharvey thanks

tiesont commented 2 years ago

Yep, that's pretty much what my MP4 implementation looks like, although I just check the first four bits (0x66, 0x74, 0x79, 0x70). Curious as to whether that means I could be getting some false matches?

apchenjun commented 2 years ago

@tiesont https://www.garykessler.net/library/file_sigs.html https://en.wikipedia.org/wiki/List_of_file_signatures

neilharvey commented 2 years ago

@tiesont From what I've read, the first four bytes in the sig mean it's a Quicktime format, then the next four are the subtype, e.g. 66 74 79 70 4D 34 41 20 is Quicktime - M4A.

tiesont commented 2 years ago

@tiesont https://www.garykessler.net/library/file_sigs.html https://en.wikipedia.org/wiki/List_of_file_signatures

I've actually seen both of those before, but thanks for the reminder.

I think I see why I stopped at 4 bytes - all of those MP4 versions have the same first four, so I'm assuming MP4 video when I find those. Probably not correct, especially since that seems to also match the 3GPP type.