pkiraly / qa-catalogue

QA catalogue – a metadata quality assessment tool for library catalogue records (MARC, PICA)
GNU General Public License v3.0
78 stars 17 forks source link

Remove historical-subfields and historical-codes? #391

Closed nichtich closed 8 months ago

nichtich commented 9 months ago

What's the purpose of historical-subfields and historical-codes? As far as I understand the source code, these values are never used for validation, are they? I'm asking because these keys are not supported by current Avram specification and I wonder about their purpose in qa-catalogue (apart from historical curiosity). Maybe they could be kept in a file for reference but historical elements unlikely change they they don't need to be re-generated from code.

By the way I used this jq call to remove the fields from an existing JSON file:

jq 'walk(if type=="object" then with_entries(select(.key!="historical-codes" and .key!="historical-subfields")) else . end)'
pkiraly commented 9 months ago

They are used in validation, and records containing them are marked as obsolete data elements. The MARC standard have been changed over times, but records do not always follow these changes. See "obsolete code" and "obsolete value" in the Gent catalogue validation page: http://gent.qa-catalogue.eu/metadata-qa/. Is there anything similar data element change in PICA?

nichtich commented 9 months ago

I removed the support of deprecated elements from Avram specification because their validation was undefined but we could add them back (see https://github.com/gbv/avram/issues/36).

How does current validation in qa-catalogue work? If a list of codes/subfields contains code/subfield a, will a parallel historical code/subfield a be ignored or will it still be used for validation?

If obsolete codes, values and subfields are only checked if there is no equal code/value/subfield in the normal list of codes or subfields, then we could encode them in Avram with a new boolean deprecated flag (internally be transformed to the current list of historical codes and subfields).

pkiraly commented 9 months ago

The validation first check the current definition, and if the value is not defined there it checks the list of "historical" (obsolete, deprecated) codes. I am OK with the deprecated flag. One more note: in the MARC standard there are other information which QA catalogue doesn't record, but maybe in the future it will or other application would like: the time it was removed, the schema it made use (e.g. UKMARC).