rails / marcel

Find the mime type of files, examining file, filename and declared type
Apache License 2.0
386 stars 67 forks source link

Prefer application/x-ole-storage instead of application/x-tika-msoffice #54

Closed gmcgibbon closed 3 years ago

gmcgibbon commented 3 years ago

Closes https://github.com/rails/marcel/issues/44

Prefer application/x-ole-storage as a fallback for ms-office documents. This was the behaviour of Marcel 0.33.

However, I'm resorting to this because I can't find a good magic matcher to identify *.olf files. The header of the binary file is already used to identify it is an OLE/office type file, and there doesn't appear to be (AFAICS) any bytes we can read at a consistent offset to denote application/vnd.ms-outlook.

The more specific matching matchers for office subtypes appear to be very subtle though. For example, the one we use for application/msword: https://github.com/rails/marcel/blob/85c2559ac0253b9b33cae10c36e3f2584efe3991/data/custom.xml#L13-L17

Regardless, I think we can agree that x-tika-msoffice was supposed to be an internal type grouping that shouldn't be surfaced in mime type detection.

pixeltrix commented 3 years ago

@gmcgibbon I'll leave these for you to merge after approval so you can manage any conflicts that may arise.