scriptotek / mc2skos

Command line script for converting Marc21 Classification and Authority records to SKOS/RDF
The Unlicense
21 stars 4 forks source link

How to manage multilingual labels #41

Open nichtich opened 6 years ago

nichtich commented 6 years ago

Is there a way to put labels in multiple languages into one MARC record, e.g. repeat field 153? If not, should mc2skos provide a method to compare and merge multiple MARC files of the same classification in different languages?

danmichaelo commented 6 years ago

Not that I'm aware of. Neither 100 in authority nor 153 in classification are repeatable. There was a discussion paper in 2001 on Multilingual Authority Records recommending separate records for each language. Interestingly it mentions something called "context markers" which could perhaps also be used to indicate language in a single-record approach, but I'm not sure what happened to that idea. There is mention of a follow-up paper to be prepared "for the midwinter 2002 meeting", but I haven't been able to find that (should have been here I guess).

I've seen model A in use as well. I think GND includes English terms in 4XX fields, but without any language marker, so that's not very optimal. We had to prepare a similar file to get our English terms searchable in Primo though. Not sure what the equivalent of 4XX would be in Marc21 classification.

Merging could be a feature. Not sure if it need to be part ofmc2skos though, or if we can rely on some other RDF tool like riot? If the URIs are based on the classification number or some other common identifier, it should be easy enough to merge the RDF files afterwards, shouldn't it?

nichtich commented 6 years ago

Thanks for the background and history. So to create multilingual KOS from MARC, multiple MARC files have to be converted and merged. Merging is easy in RDF but making sure that all input files align could cause problems. It may be more reliable to have one master file and additional translation files. The latter should only be used for string properties (skos:prefLabel, skos:altLabel, skos:scopeNote, skos:editorialNote, skos:historyNote). My use case is to help get English translations into the RVK classification.

I think a good solution would be an option to only include string properties and a tool/guideline to merge KOS files.

$ mc2skos master.xml master.ttl
$ mc2skos --stringsOnly translation.xml translation.ttl
$ merge master.ttl translation.ttl > multilingual.ttl

Here merge can be replaced by cat for RDF/Turtle ([nd]json need other mechanism) but some additional checking would be better to make sure that the translation does not add any concepts not included in the master. Anyway this checking should better be put into another tool, e.g. skosify.