neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).
MIT License
258 stars 40 forks source link

All types of Jpeg files are not handled #1

Closed guilhemperalta closed 7 years ago

guilhemperalta commented 7 years ago

Hello! Nice library here :)

I used it at work and it's almost perfect for what I need. I found though that it does not handle all types of JPEG files. According to wikipedia and this reference page there are 3 other possible signatures for them:

The library can be easily extended so I had no trouble to use it anyway, but maybe we could add all types of Jpeg files? Or perhaps limit the signature to the first 3 bytes?

neilharvey commented 7 years ago

Hi, thanks for the feedback! :)

I'd be tempted to change the existing Jpeg format to match the first three bytes, then have formats for the other three inherit from that. So then if you wanted to test for any type of JPEG you could have the following:

if(format is Jpeg) {
  // Matches JFIF, EXIF, SPIFF or RAW
}

if(format is JpegExif) {
  // Only matches EXIF
}

Seems like it could be nice solution as it isn't a breaking change and fits in with how some of the other formats are working. What do you think?

guilhemperalta commented 7 years ago

That would be nice indeed. It seems that several camera makers use custom headers starting from the 4th byte (this site also mentions FF D8 FF E3 for some Samsung cameras), so a solution with specialized classes and a more relaxed base class as a fallback would fit very well.

neilharvey commented 7 years ago

After spending a bit of time looking for different specifications, it looks as though the magic number consists of two parts - FF D8 corresponds to the SOI (Start of Image) flag which is common to all JPEG formats. This is then followed by an app marker, e.g.

FF E0 - APP0 - JFIF FF E1 - APP1 - EXIF FF E8 - APP8 - SPIFF

I've implemented the base Jpeg class to recognise the SOI, and then added additional classes for JFIF, EXIF and SPIFF which contains the specific app markers.

Each of the formats also defines an identifier which is located at offset 06 and is the ASCII-encoded name of the format (e.g. EXIF). I haven't added this because it meant reading more bytes to validate the format when we already have the answer from the app marker, but I can implement it if you think it's desirable to have that added validation in place.

I've uploaded a prerelease version of the NuGet package which contains the changes, could you give it a try and check everything looks correct?

guilhemperalta commented 7 years ago

Thank you for your effort! For the specific use case I intend to use the library for, the validation of the SOI + app marker is more than enough. More globally, I don't think that pushing the validation further would be of any substantial value to anyone (maybe I'm wrong :smile:) Anyway I'll give it a try tomorrow.

guilhemperalta commented 7 years ago

Ok, the new version you packaged correctly handles all the files I had. Thanks a lot!

neilharvey commented 7 years ago

Great! I've merged the changes into master and pushed the release version of 1.1.0 to NuGet so it should show up soon. 😄