Characterization with FITS can report multiple mime types, which can break the File

botimer commented 6 years ago

When ingesting certain types of files, the characterization by FITS can report multiple mime types as a comma-separated attribute. The CharacterizationService document does not detect or clean this, nor does the OM document.

Once the File is updated and persisted to Fedora, the invalid mime type results in 500 errors thrown for any GET or HEAD requests for the file (with Jersey throwing a parsing exception on the comma).

It's reasonable to say that both FITS (or Droid) and Fedora are behaving badly in this case, but it is also reasonable to be able to clean up or transform values after extraction. OM does not have a mechanism for doing transformation of a term beyond the hoisting done with proxied terms, so a new method on the FitsDocument is the best candidate.

A snippet showing the problematic mimetype attribute:

<fits xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd" version="1.2.0" timestamp="2/7/18 10:03 PM">
  <identification status="SINGLE_RESULT">
    <identity format="netCDF-3 Classic" mimetype="application/netcdf, application/x-netcdf" toolname="FITS" toolversion="1.2.0">
      <tool toolname="Droid" toolversion="6.3" />
      <externalIdentifier toolname="Droid" toolversion="6.3" type="puid">fmt/282</externalIdentifier>
    </identity>
  </identification>

botimer commented 6 years ago

Flagging @fritzfreiheit, @njaffer, and @mutanthumb.

botimer commented 6 years ago

Relates to:

samvera / hydra-works

Characterization with FITS can report multiple mime types, which can break the File #334