rails / marcel

Find the mime type of files, examining file, filename and declared type
Apache License 2.0
386 stars 67 forks source link

Apache tika mime types used in marcel 1.0.0 missing common font types (font/woff, font/woff2,.. #34

Closed sbocinec closed 3 years ago

sbocinec commented 3 years ago

The new Apache Tika MIME types introduced in v1.0.0 https://github.com/rails/marcel/commit/2e58d1986715420f0abbba060b6e158d6f4d3a05 are missing common font types, e.g. font/woff, font/woff2, etc. Apache Tika seems to support only a limited set of TrueType fonts https://tika.apache.org/1.20/formats.html#Font_formats.

The previous used MIME types DB from freedesktop project used in mimemagic gem used to have these types: https://gitlab.freedesktop.org/xdg/shared-mime-info/-/blob/master/data/freedesktop.org.xml.in#L2085

The lack of these types is causing issues with validations (e.g. we are using shrine with marcel to validate metadata but were forced to upgrade to v1.0.0 due to the recent issues with mimemagic) - we allow users of our app to use/upload these modern font formats that are heavily used on the web.

I'm not yet sure about the best solution for this issue- ideal would be to add changes in the upstream (Apache Tika) so marcel can pull those later, though, at a quick look the project seems to be understandably limitting the supported mime types. So at least wanted to report and potentially discuss the issue as I think there will be others coming affected by missing the same or other types

georgeclaghorn commented 3 years ago

For now, I think it would be best to add these to our custom types in data/custom.xml. There are few enough registered font/* types that we should probably just cover all of them.

sbocinec commented 3 years ago

@georgeclaghorn thanks for your answer. I agree, adding to the marcel's data/custom.xml might be better than to update tika. Though it will probably generate a lot of maintenance and releases to regularly update the gem to (ideally) match the mimemagic / freedesktop.org.xml MIME types parity :thinking: .

Anyway, I've tried to follow the suggestion and added the mentioned types in PR #37 to validate, if this is the way to go.