neilharvey / FileSignatures

A small library for detecting the type of a file based on header signature (also known as magic number).
MIT License
250 stars 41 forks source link

Add Offset and add 4 format files with offset #11

Closed mexvod closed 5 years ago

neilharvey commented 5 years ago

Hey, thanks for the PR. Would you be able to add some sample files and functional tests for the new formats you've added so that we can verify they work? I usually just save a blank document in the sample format which is enough for the headers to be created.

neilharvey commented 5 years ago

I did also notice OffsetFileFormat.IsMatch duplicates a lot of logic from FileFormat. This could be removed by merging the changes into the base class. Also needs some unit tests to prove that the offset works as expected.

Merging into FileFormat could work by adding the offset parameter to the constructor and then adding an overload with the current parameters that sets offset to zero, which would be a non-breaking change. The tricky part will be dealing with the header length, which I'm undecided how best to deal with.

Setting the property to the sum of the offset and the provided headerLength could be unintuitive for subclasses, but not summing them could be unintuitive for consumers. Thoughts?

If you can get me the samples for your new formats, I'd be happy to take the PR and then figure out how I want to merge it into the base class.

mexvod commented 5 years ago

Hey, i added sample files and functional tests

mexvod commented 5 years ago

Hem! Maybe so?

mexvod commented 5 years ago

And also, I noticed a problem with Union this: "return formatsInAssembly.Union(formatsThisAssembly);" Bug! Maybe Concat?

mexvod commented 5 years ago

Xls test signature not work (signature public ExcelLegacy() : base(new byte[] { 0x09, 0x08, 0x10, 0x00, 0x00, 0x06, 0x05, 0x00 }, "application/vnd.ms-excel", "xls", 512) https://www.filesignatures.net/index.php?search=xls&mode=EXT )

neilharvey commented 5 years ago

Thanks, I'll look at pulling in the formats and offset implementation.

neilharvey commented 5 years ago

I'll see if I can take a look at your XLS file. The signature you listed on the FileSignatures.net site isn't a reliable way to test for Excel files, because they are based on CFBF and the identifier can move to a different point in the file (which is probably why that site lists 8 different signatures for XLS) so we have to read the CFB header and use it to locate the CLSID which defines the format. But for some reason the CLSID is being returned as zero, I'll need to use a tool to see what's wrong with it.

mexvod commented 5 years ago

Thank! If you add a offset, I can add many different files. By the way, OpenOffice files first have a zip [0x50, 0x4B, 0x3, 0x4] header, and then 512 offset and headers: 'mimetypeapplication / vnd.oasis.opendocument.text', 'mimetypeapplication / vnd.oasis.opendocument.spreadsheet', ' mimetypeapplication / vnd.oasis.opendocument.presentation ' OR 0x6D, 0x69, 0x6D, 0x65, 0x74, 0x79, 0x70, 0x65, 0x61, 0x70, 0x70, 0x66, 0x69, 0x63, 0x61, 0x69, 0x63, 0x6C, 0x69, 0x63, 0x61, 0x69, 0x63, 0x6C, 0x69, 0x63, 0x61, 0x69, 0x69, 0x61, 0x69, 0x69, 0x69, 0x6D , 0x76, 0x6E, 0x64, 0x2E, 0x6F, 0x61, 0x73, 0x69, 0x73, 0x2E, 0x6F, 0x70, 0x65, 0x6F, 0x64, 0x6F, 0x63, 0x75, 0x6F, 0x63, 0x75, 0x6F, 0x63, 0x75, 0x6F, 0x63, 0x75, 0x6F, 0x63, 0x75, 0x6F, 0x63, 0x75, 0x6F, 0x63, 0x75, 0x6F, 0x63, 0x75, 0x6F, 0x63, 0x75, 0x6F, 0x65, 0x75, 0x6F, 0x63, 0x75, 0x6F, 0x65, 0x75, 0x6F, 0x63, 0x75, 0x6F , 0x65, 0x73, 0x65, 0x6E, 0x74, 0x61, 0x74, 0x69, 0x6F, 0x6E and more ...

mexvod commented 5 years ago

Also, with the zip header there are such files: xpi (from addons.mozilla.org, offset 30), epub (application / epub + zip, offset 30)

neilharvey commented 5 years ago

Hey, I've merged in your changes, although I had to revert the broken test because it was breaking the build. I'll look into this as a separate issue. I also want to add a couple of unit tests for the offset, so expect to push a new release with your changes in next couple of days or so.

mexvod commented 5 years ago

Thank! I will wait for updates.