sul-dlss / dlme-transform

Transforms raw DLME metadata to DLME intermediate representation
Apache License 2.0
0 stars 2 forks source link

Normalize Stanford dates for date slider #247

Closed ndushay closed 5 years ago

ndushay commented 5 years ago

To populate cho_date_range_norm for Stanford collections:

Note that mods dateCreated is only populated 9 times in a total of 6 records (we have 107 records).

cm881bj1960.mods
41:    <dateCreated encoding="w3cdtf" keyDate="yes" point="start" qualifier="approximate">1825</dateCreated>
42:    <dateCreated encoding="w3cdtf" point="end" qualifier="approximate">1875</dateCreated>
43:    <dateCreated encoding="w3cdtf" keyDate="yes"/>

dx161mc8937.mods
57:    <dateCreated encoding="w3cdtf" keyDate="yes">1224</dateCreated>

mm896qm6737.mods
8:    <mods:dateCreated encoding="w3cdtf" keyDate="yes" qualifier="inferred">1725</mods:dateCreated>

nk663xb7601.mods
55:    <dateCreated encoding="w3cdtf" keyDate="yes">1804</dateCreated>

rz546pm0247.mods
8:    <mods:dateCreated encoding="w3cdtf" keyDate="yes" qualifier="approximate" point="start">1700</mods:dateCreated>
13:    <mods:dateCreated encoding="w3cdtf" keyDate="no" qualifier="approximate" point="end">1750</mods:dateCreated>

tk780vf9050.mods
84:    <dateCreated encoding="w3cdtf" keyDate="yes">1777</dateCreated>
[ndushay@li-dl-7064-18cd data (master)]$ ack dateIssued | wc -l

Mods dateIssued has 149 hits.

The stanford-mods algorithm for pub date, https://github.com/sul-dlss/stanford-mods/blob/master/lib/stanford-mods/origin_info.rb#L188-L193, looks in dateIssued. We can prioritize dateCreated and use dateIssued if there is no dateCreated ...

I think this would look like the penn_museum date macro, in that we'd probably be looking at the record, not the accumulator, for the raw value, and then turn it into an array.

In the current data, I'm only seeing single years.

Macro will look in record var

ndushay commented 5 years ago

I wrote this up before the parse_date gem had a lot of zing to it; let's talk through what is needed before running with what I wrote above. The patterns are valid -- it's the "what is the best way to get the date range" that might have changed, as a bunch of these patterns are now handled by ParseDate.range_array.

ndushay commented 5 years ago

Algorithm for dates:

  1. look in dateCreated
    • if there are multiples, look for attribute point for "start" and "end" and use those values for range
    • if no "start" and "end", look for keyDate and parse it for range
    • if no keyDate, take the first value and parse it for range
  2. if no dateCreated, look in dateValid
    • if there are multiples, look for attribute point for "start" and "end" and use those values for range
    • if no "start" and "end", look for keyDate and parse it for range
    • if no keyDate, take the first value and parse it for range
  3. if no dateCreated and no dateValid, look in dateIssued
    • if there are multiples, look for attribute point for "start" and "end" and use those values for range
    • if no "start" and "end", look for keyDate and parse it for range
    • if no keyDate, take the first value and parse it for range