sul-dlss / dlme-transform

Transforms raw DLME metadata to DLME intermediate representation
Apache License 2.0
0 stars 2 forks source link

Ensure all controlled vocabularies are complete #676

Closed jacobthill closed 3 years ago

jacobthill commented 3 years ago

As the DLME data manager I need to ensure that some metadata fields only contain values found in a controlled vocabulary. For example, cho_edm_type, and cho_language all have normalized values from a controlled vocabulary. Currently that controlled vocabulary may be spread across multiple translation maps. It is currently possible to input typos or variant spellings of values in a translation map without realizing it. Checking these field values against a controlled vocabulary will prevent this.

Consult the field notes for a full list of fields that are mapped to a controlled vocabulary.

Requirements:

jacobthill commented 3 years ago

@aaron-collier I'm not sure how you want to revise this, but it just occurred to me that this is technically the same thing as https://github.com/sul-dlss/dlme-transform/issues/699. Maybe the epic should be validate translation map chain or something like that. Essentially all of these issues are "make sure the values from one yaml file are present as keys in another yaml file." I was also thinking about the rake task and that might be useful but maybe we can avoid relying on the user to remember to run the rake task. Could we simply check all of these translation maps when a transform is triggered and raise an error if any of the values are not present in the appropriate yaml file? Then if it fails print out the file and the missing values. This would not require the transform to run to perform the validation but would also be robust against the user forgetting a step in the process.

aaron-collier commented 3 years ago

@jacobthill , thinking about this, and a few of the other tickets related to this are actually tickets about gathering appropriate fixture data. We'll need to gather example data to use in the tests. Let's discuss.

aaron-collier commented 3 years ago

@jacobthill I think you're right though, this EPIC captures 699 and 704 as well. I'm going to open individual tickets around tests and updates for each lookup and likely close 699 and 704 (with a title/description update here).

aaron-collier commented 3 years ago

@jacobthill same as the other ticket, I'm closing this in favor of a rewritten epic in order to keep your original description if needed.