rails / marcel

Find the mime type of files, examining file, filename and declared type
Apache License 2.0
386 stars 67 forks source link

ogg vorbis files recognized as audio/vorbis #64

Closed ClearlyClaire closed 2 years ago

ClearlyClaire commented 3 years ago

Trying to detect the MIME type of an ogg vorbis file returns audio/vorbis, which according to https://wiki.xiph.org/MIME_Types_and_File_Extensions is for vorbis streams without containers, instead of audio/ogg.

irb(main):003:0> Marcel::Magic.by_magic(File.open('spec/fixtures/files/boop.ogg'))
=> #<Marcel::Magic:0x0000560870431ec0 @mediatype="audio", @subtype="vorbis", @type="audio/vorbis">
irb(main):002:0> Marcel::MimeType.for(Pathname.new('spec/fixtures/files/boop.ogg'))
=> "audio/vorbis"
irb(main):004:0> Marcel::MimeType.for(File.open('spec/fixtures/files/boop.ogg'), name: 'boop.ogg')
=> "audio/vorbis"

This is a bit surprising and might throw some tools off, as audio/vorbis is generally not expected nor associated with any file format of extension (since it's for streams themselves and not files/containers).

Passing declared_type works but it might not be provider, or provided by an untrusted source:

irb(main):006:0> Marcel::MimeType.for(File.open('spec/fixtures/files/boop.ogg'), name: 'boop.ogg', declared_type: 'audio/ogg')
=> "audio/ogg"

EDIT: this seems to come from https://github.com/rails/marcel/blob/main/data/tika.xml#L5135-L5146 and introduced in Apache Tika by https://github.com/apache/tika/commit/41c6749f7372b6ac3a9acfd309a4aaca3e077a07 but I do think it's wrong, as audio/vorbis seem to be defined by RFC5215 and specific to RTP streams.

gmcgibbon commented 3 years ago

This is mentioned on https://github.com/rails/marcel/issues/48 as a regression between 0.3.3 and 1.0.0. I think regardless of if it is right or wrong, this is still a regression from the other mime DB we were using. I'll try fixing it!