neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).
MIT License
250 stars 41 forks source link

Use official, IANA-registered media type for Windows executables #68

Closed cocowalla closed 2 months ago

cocowalla commented 2 months ago

The Executable FileFormat currently uses media type application/octet-stream, which is a generic type representing any unknown binary data.

However, the application/vnd.microsoft.portable-executable type describes Windows Portable Executable (PE) files, and is registered with IANA: https://www.iana.org/assignments/media-types/application/vnd.microsoft.portable-executable. Please consider using application/vnd.microsoft.portable-executable instead.

tiesont commented 2 months ago

Do you have a reference for the magic bytes that uniquely identify a PE file? Granted, I didn't search for too long, but the best I've found so far is still fairly ambiguous.

cocowalla commented 2 months ago

Well, it doesn't get more official than the venerable Raymond Chen, who confirms the "MZ" header, and that it was named after another Microsoft legend, Mark Zbikowski 😉

Besides that, Wikipedia is in agreement too - 0x4D, 0x5A seems good 👍

tiesont commented 2 months ago

Wikipedia also seems to indicate that those bytes match quite a few types: https://en.wikipedia.org/wiki/List_of_file_signatures (see DOS MZ executable and its descendants)

Not saying you're wrong, by any means, but 4D 5A seems like it matches more than just PE files. (I'm nowhere near an expert on this, so if I'm wrong, then I'm wrong, just making some observations).

cocowalla commented 2 months ago

I took another look, but I only see a single entry for 4D 5A? Or if you meant because it also matches descendants of MZ files (such as PE files), that's the desired behaviour, as all those descendant types are still executable files.

tiesont commented 2 months ago

But is everything with those bytes a PE file? Seems like "no", so that media type won't always be correct.

If those bytes always identify a Windows executable, that's probably helpful, just not sure that the media type you're proposing matches what Windows itself would report (assuming it distinguishes between them). I'm wondering how hard it would be to find a decent collection of (safe) examples to test against...

cocowalla commented 2 months ago

Oh, right I see what you mean - you want to distinguish between MZ files and PE files? TBH, I think that's going to be overkill for almost every use case, and we'd be better simply detecting all MZ files as "Windows executables", which for most people are synonymous with "PE files" (even if not technically true). Then if someone wants to go deeper into the PE file format, there are specialised libraries for that.

So I guess the only real question is which MIME type to use. application/octet-stream is not correct, and there is no "official" MIME type for generic Windows executables, only one for PE files. application/x-msdownload, while not official, seems to be used a lot for Windows executables too, so I guess if you weren't keen on the more specific application/vnd.microsoft.portable-executable, that might suffice as an alternative?

tiesont commented 2 months ago

Right.

In my use case, if the file seems to be an executable, I reject the upload and abort my processing, so it really doesn't matter what the reported MIME is. If I wanted to allow some executables but not others, then I'd probably want the best known match (whatever that may be). No idea how common any of that is.

In the end, it's whatever @neilharvey decides. I'm just weighing in as someone who uses the library.

cocowalla commented 2 months ago

Forgot, there is also application/x-executable. While not an official MIME type, it's been used on Linux systems for a very long time.

neilharvey commented 2 months ago

From what I can tell DOS MZ executable is the container format for several sub-formats and will always have the header MZ at the start of the file. I'm not sure what the correct mime type should be, or whether there is even a definition for one.

Portable Executable is a subtype of DOS MZ and will have MZ header followed by a PE header. However, looking at some samples, the PE header does not have a fixed location - there are typically a few bytes between the MZ and the PE headers (looks like some sort of validation function). Both exe and dll files are types of PE file.

Whist I think that technically @tiesont is correct and that it's possible that an MZ header could match something which is not a PE, I think in practice what most people encounter is likely to be a PE file. This is somewhat enforced by the fact that this appears to be the only executable mime type that Microsoft have registered. So I think that application/vnd.microsoft.portable-executable is probably the right answer for 99% of use cases.

I could extend the Executable format to search for the PE header to guarantee that the format is correct but it would be slightly less efficient (needing to read an arbitrary number of header bytes before we give up) so I'm not sure I can be bothered unless someone raises an issue that it's wrong :)