samvera / hydra-works

A ruby gem implementation of the PCDM Works domain model based on the Samvera software stack
Other
24 stars 14 forks source link

Multiple, conflicting values for dateCreated in characterization metadata #336

Closed kefo closed 6 years ago

kefo commented 6 years ago

We have more than 200,000 pcdm:Files (most are images probably but I can't confirm this) in our collection with multiple and unequal ebucore:dateCreated values.

It's easy to see how this is happening when comparing the fits output:

  <fileinfo>
    <size toolname="Jhove" toolversion="1.16">862707</size>
    <creatingApplicationName toolname="Exiftool" toolversion="10.00" status="CONFLICT">Canon EOS 5D Mark III</creatingApplicationName>
    <creatingApplicationName toolname="NLNZ Metadata Extractor" toolversion="3.6GA" status="CONFLICT">Adobe Photoshop Lightroom 6.7 (Macintosh)</creatingApplicationName>
    <lastmodified toolname="Exiftool" toolversion="10.00" status="CONFLICT">2017:05:30 15:07:03</lastmodified>
    <lastmodified toolname="Tika" toolversion="1.10" status="CONFLICT">2017-05-30T15:07:03</lastmodified>
    <created toolname="Exiftool" toolversion="10.00" status="CONFLICT">2017:05:26 09:43:42</created>
    <created toolname="NLNZ Metadata Extractor" toolversion="3.6GA" status="CONFLICT">2017:05:30 15:07:03</created>
    <filepath toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">/Users/kford1/Work/fits/samples/image/IM023744_016.jpg</filepath>
    <filename toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">IM023744_016.jpg</filename>
    <md5checksum toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">2e887492065e262fc866a2028c160f4b</md5checksum>
    <fslastmodified toolname="OIS File Information" toolversion="0.2" status="SINGLE_RESULT">1519246100000</fslastmodified>
  </fileinfo>

to the code that parses the fits document, line 26:

https://github.com/samvera/hydra-works/blob/d41be82540eca9d3a76301df3dcea5b74c9d5937/lib/hydra/works/characterization/fits_document.rb#L23-L26

Line 23, and why I have quoted a block, presents a solution, used only for lastmodified (which is oddly not part of base_schema.rb, but I digress).

Would it make sense to similarly restrict created to only the ExifTool output? (That seems preferable seeing as NLNZ's 'created' date appears to be the modification date, assuming we can trust ExifTool and Tika for that information.)

If that is potentially too restrictive, is there a way to select a single creation date in a more sophisticated manner (if Exiftool, elseif Tika, else if...)?

escowles commented 6 years ago

@kefo 👍 that sounds like a good idea to me