Open kiegel opened 6 years ago
This is complicated since I think some of this is bad data vs bad conversion. We'll investigate and report back.
I've also seen the converter create @ru-cyrl
language tags where the -cyrl
is redundant and forbidden by BCP 47. I've chosen to ignore them for now.
The specs are going to be updated - pretty sure the best solution is to stop adding tags based on 008+$6.
If the marc included the language with the script it would be different and is technically possible, we were also going to look into that as well.
In regard to internationalization, the logic for applying language tags needs work for parallel-script fields (880), e.g. with translations or parallel titles.
Incorrect Language Tags and Script Subtags For example, problems crop up with OCLC #271414, an English translation of a Russian work.
The label is Cyrillic but in Russian, not English.
The label is English but not Cyrillic. In general, it is vanishingly rare for a string to be both in the English language and in the Cyrillic script.
OCLC # 793950140, a Chinese translation of a Japanese work.
The title in the label is Japanese, not Chinese.
OCLC # 893875561, a Latvian book with a parallel title in Russian.
The title in the label, mainTitle and subtitle is Russian, not Latvian.
Compliance with IETF RFC 5646 Use of language tags should follow the practices given in IETF RFC 5646 [1]. Concerning the script subtag, on page 12 it states “[it] SHOULD be omitted when it adds no distinguishing value to the tag or when the primary or extended language subtag's record in the subtag registry includes a 'Suppress-Script' field listing the applicable script subtag”.
For example, for OCLC # 1779370:
Russian has the Suppress-Script field so a script subtag for Cyrillic is prohibited.
Not Good Practice Using a language tag for numeric data in bf:part is not wrong but probably not a good practice.
[1] https://tools.ietf.org/html/bcp47