hey-red / Mime

.NET wrapper for libmagic
MIT License
84 stars 22 forks source link

Excel files are detected as "zip" #28

Closed erdemkeren closed 4 years ago

erdemkeren commented 6 years ago

I have 'xlsx' files. When I try to validate them using the MimeGuesser.GuessExtension, I get "zip".

My expectation was to receive "xlsx" and "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"

hey-red commented 6 years ago

Try to use that .mgc file https://1drv.ms/u/s!As7Wg8FO5aVQhMshWVhUTmXjSnEN3g

// Set magic database
MimeGuesser.MagicFilePath = @"/path/to/magic.mgc";

let me know if that helps.

hey-red commented 6 years ago

Package has been updated.

erdemkeren commented 6 years ago

No luck. I still get "application/zip" as a result of:

 var guessedMimeType = MimeGuesser.GuessMimeType(formFile.OpenReadStream());

using version 3.0.2 and the uploaded .mgc file.

Btw, sorry for my late response. @hey-red

hey-red commented 6 years ago

Hmm, can we provide that file(without data) ?

erdemkeren commented 6 years ago

Sure! See https://drive.google.com/file/d/1s9B-KZsW9jr3qyJMzhAjdD8uWUtbdU76/view?usp=sharing (needed to edit this using Numbers, but looks problem still exists.). Btw, I have 3 different excel files like this. All reports application/zip

hey-red commented 6 years ago

@erdemkeren Okay. I'm try to investigate it. Do you know where that files has been created? MSO, libreoffice or another?

erdemkeren commented 6 years ago

ill ask tomorrow but im afraid they are generated by other automation systems.

Tomorrow edit: They told us that the excel files are generated by a flight schedule software.

dlatikay1 commented 6 years ago

we've got the same problem with all Microsoft OpenXml formats, also docx, docm, xlsm this is a known problem https://serverfault.com/q/338087/387902 and it seems to be a matter of loading the correct mgc, not a problem with your library at all. The difficult part is that those are ZIP files with a wrong extension, and libmagic needs to check for the existence of certain file names in the ZIP catalog, or even uncompress something to figure it out...

I also noticed that it is necessary to tolerate .jpeg as the guess for files with the JPG extension.

hey-red commented 6 years ago

Yes, it libmagic DB problem. I'm not big fan of validation with libmagic, although it's depends on what you should doing with file later. If we need to store file and we trust source then just use extension or check magic bytes with libmagic(or other lib like Mime-Detective). Otherwise we need to full reading this file(decode) with other libs.