rails / marcel

Find the mime type of files, examining file, filename and declared type
Apache License 2.0
386 stars 67 forks source link

Replace mimemagic with Apache Tika's mime types data #30

Closed georgeclaghorn closed 3 years ago

Deradon commented 3 years ago

@georgeclaghorn Is it fine to include the apache licenced XML in an MIT licenced project? If so I'd happily get my hands dirty on this approach as well if you'd like to have some support.

georgeclaghorn commented 3 years ago

@Deradon, the plan is to relicense this gem under MIT (for the code) and Apache (for the data).

We’re through most of the work, but thanks for offering!

pjmartorell commented 3 years ago

This SVG image is being detected as application/xml instead of image/svg+xml when calling Marcel::Magic.by_magic(file).type. I guess the following should be added to the custom.xml table:

  <mime-type type="image/svg+xml">
    <_comment>SVG image</_comment>
    <acronym>SVG</acronym>
    <expanded-acronym>Scalable Vector Graphics</expanded-acronym>
    <sub-class-of type="application/xml"/>
    <magic priority="80">
      <match type="string" value="&lt;!DOCTYPE svg" offset="0:256"/>
      <match type="string" value="&lt;svg" offset="0:256"/>
    </magic>
    <glob pattern="*.svg"/>
    <root-XML namespaceURI="http://www.w3.org/2000/svg" localName="svg"/>
  </mime-type>
pjmartorell commented 3 years ago

This SVG image is being detected as application/xml instead of image/svg+xml when calling Marcel::Magic.by_magic(file).type. I guess the following should be added to the custom.xml table:

  <mime-type type="image/svg+xml">
    <_comment>SVG image</_comment>
    <acronym>SVG</acronym>
    <expanded-acronym>Scalable Vector Graphics</expanded-acronym>
    <sub-class-of type="application/xml"/>
    <magic priority="80">
      <match type="string" value="&lt;!DOCTYPE svg" offset="0:256"/>
      <match type="string" value="&lt;svg" offset="0:256"/>
    </magic>
    <glob pattern="*.svg"/>
    <root-XML namespaceURI="http://www.w3.org/2000/svg" localName="svg"/>
  </mime-type>

After the latest changes related with SVG images, the SVG image above is still being detected as application/xml. It seems that if there is a newline after the <svg instead of a space it's not correctly detected:

<svg
        xmlns:dc="http://purl.org/dc/elements/1.1/"
        ...
eraffel-MDSol commented 3 years ago

I think the version of nokogiri you downgraded to has a security vulnerability: https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-vr8q-g5c7-m54m

would ~> 1.9 work?

georgeclaghorn commented 3 years ago

It’s a development dependency used only on fixed in-tree XML.