rails / marcel

Find the mime type of files, examining file, filename and declared type
Apache License 2.0
387 stars 67 forks source link

Chore: Update tika definitions to latest release of Tika 2.7.0 #92

Closed vipulnsward closed 1 year ago

vipulnsward commented 1 year ago

What:

Why:

TODO: Spec failures, that I am looking at :-)

vipulnsward commented 1 year ago

These failures seem to be due to new Tika definitions

  1) Failure:
Marcel::MimeType::MagicTest#test_gets_type_for_text/html_by_using_only_magic_bytes_text/html/html_with_svg.html [/Users/sward/work/marcel/test/magic_test.rb:10]:
Expected: "text/html"
  Actual: "image/svg+xml"

  2) Failure:
Marcel::MimeType::MagicTest#test_gets_type_for_image/svg+xml_by_using_only_magic_bytes_image/svg+xml/svg_with_xml_declaration.svg [/Users/sward/work/marcel/test/magic_test.rb:10]:
Expected: "image/svg+xml"
  Actual: "application/xml"

Changelog: https://dist.apache.org/repos/dist/release/tika/2.7.0/CHANGES-2.7.0.txt

vipulnsward commented 1 year ago

This needs more work

dkam commented 10 months ago

Hello - I've got a branch with the new Tika definitions and I've updated the MAGIC array to fix both those test failures:

  1. I replaced the new 'text/html' definition with the pre-Tika import value.
  2. Moved one of the 'image/svg+xml' definitions higher ( up to the other svg+xml entry )