ucldc / rikolti

calisphere harvester 2.0
BSD 3-Clause "New" or "Revised" License
7 stars 3 forks source link

Address type enrichments (upper/lowercase normalization; dictionary) #668

Open christinklez opened 8 months ago

christinklez commented 8 months ago

Constants.py (update and maintain type dictionary): https://github.com/ucldc/rikolti/blob/0e03c4fe6a08172ed9affaf04b9f34c9bcb5d8c8/metadata_mapper/mappers/constants.py Type case normalization (Image → image)

Running list of potential additions: enrichments: Add to: https://github.com/ucldc/rikolti/blob/0e03c4fe6a08172ed9affaf04b9f34c9bcb5d8c8/metadata_mapper/mappers/constants.py

On the list but not normalizing; investigate? https://github.com/ucldc/rikolti/blob/0e03c4fe6a08172ed9affaf04b9f34c9bcb5d8c8/metadata_mapper/mappers/constants.py

christinklez commented 7 months ago

Mapper: omeka.omeka Collection: 26725

Please add this type vocab:

christinklez commented 7 months ago

Mapper: oai.content_dm.csu_sac Collection: 11068 Please see additional type vocab comments in #664 (which has more details about where the list below comes from.)

Essentially proposing to also add:

christinklez commented 6 months ago

Mapper: oai.content_dm.lapl Collection: 26094

This collection is using the Registry fill value: Image There is a capitalization issue: Expect: image & Actual: Image

christinklez commented 4 months ago

Per #879 -- "We will not need to dive deeper into the details of the type enrichment in order to solve the other three issues, so we are choosing to continue holding on this issue until it becomes a priority, let's reprioritize this issue against other issues in the To do list."

christinklez commented 4 months ago

As we have an updated strategy to address type capitalization normalization, we will prioritize this as part of post-mvp work.

aturner commented 3 months ago

Proposed additions, based on UCSD Library harvests:

Mapper: ucsd_blacklight.ucsd_blacklight Collections: (multiple)

The source metadata has these variants of DC Type "sound", in the source metadata "resource_type_tesim" field. UCSD Library isn't inclined to update the source metadata, as they're using them for specific facets in their digital collections portal:

For the time being, Gabriela updated Registry to blanket-supply the DC Type sound using the overwrite mode

christinklez commented 1 month ago

Add sound recording-nonmusical as a type. See #963