Open mdavidn opened 2 years ago
Here's my workaround for now, added to an initializer.
if Marcel::MimeType.for("PK\03\04wmv2") == 'video/x-ms-wmv'
Marcel::Magic.remove('video/x-ms-wmv')
end
Just been bitten by this for a PDF as well - looking at the definition here it seems like that any instance of the string wmv2
in the first 8KB will trigger this match:
Seems wildly broad as a magic string but I think the issue is the Tika rule is designed to match a codec type so would only apply in the context of a file ending in .wmv
whereas Marcel is applying it as a general magic string. There could be other examples of mismatches like this in the Tika source file 😬
I have a valid ZIP archive that happens to include the bytes
wmv2
in the first four kilobytes. Active Storage misidentifies the file as Windows Media Video. When scanning over such a broad range of bytes, WMV magic needs a lower priority than other matches.