The Glottolog project has more granular coverage of languages and fills several gaps left by the ISO codes. They have also built data links to other projects and sources, such as The Open Language Archives Community, Wikidata, and The Online Database of Interlinear Text. It might be worthwhile to consider implementing their URIs, in particular for the gaps left by ISO codes but perhaps even for all language codes.
As an example of a gap filled by Glottolog, take Christian Palestinian Aramaic. While both Jewish Palestinian Aramaic and Samaritan Aramaic have ISO 639-3 codes (jpa and sam, respectively), there is no code for CPA. Glottolog provides such a code: chri1239.
As an example of increased granularity, Glottolog provides a URI for Eastern Syriac distinguished from Western Syriac. Currently, we only have a way to distinguish these two at the level of scripts: syr-Syrn vs syr-Syrj.
This dialect granularity may not be useful or meaningful in every case, but the current, ISO-based system is constrained to the generic "syr" or "syrc" codes. (As a more meaningful example, Glottolog allows more precise designation of Boharic, Sahidic, etc. in place of ISO's generic "cop" for Coptic. This is a case we have run into in the Manuscript catalogue).
Glottolog's URIs also map to ISO codes, where available, so we would retain these links if using Glottocodes, e.g. https://glottolog.org/resource/languoid/id/clas1252, Classical Syriac, has a link to ISO code "syrc". For that matter it may be possible to traverse their LOD graph to find the next-broadest language code that has an ISO equivalent, so "East Syriac" would resolve up the tree to "Classical Syriac" which points to ISO "syrc".
We still need to determine exactly how to implement these in @xml:lang attributes. For now, we can use the ISO code when it is available and use an un-prefixed Glottocode, e.g. east2681, for other languages.
We should discuss further whether or not we should serialize everything to Glottolog for the sake of accuracy and precision.
The Glottolog project has more granular coverage of languages and fills several gaps left by the ISO codes. They have also built data links to other projects and sources, such as The Open Language Archives Community, Wikidata, and The Online Database of Interlinear Text. It might be worthwhile to consider implementing their URIs, in particular for the gaps left by ISO codes but perhaps even for all language codes.
As an example of a gap filled by Glottolog, take Christian Palestinian Aramaic. While both Jewish Palestinian Aramaic and Samaritan Aramaic have ISO 639-3 codes (jpa and sam, respectively), there is no code for CPA. Glottolog provides such a code: chri1239.
As an example of increased granularity, Glottolog provides a URI for Eastern Syriac distinguished from Western Syriac. Currently, we only have a way to distinguish these two at the level of scripts: syr-Syrn vs syr-Syrj.
This dialect granularity may not be useful or meaningful in every case, but the current, ISO-based system is constrained to the generic "syr" or "syrc" codes. (As a more meaningful example, Glottolog allows more precise designation of Boharic, Sahidic, etc. in place of ISO's generic "cop" for Coptic. This is a case we have run into in the Manuscript catalogue).
Glottolog's URIs also map to ISO codes, where available, so we would retain these links if using Glottocodes, e.g. https://glottolog.org/resource/languoid/id/clas1252, Classical Syriac, has a link to ISO code "syrc". For that matter it may be possible to traverse their LOD graph to find the next-broadest language code that has an ISO equivalent, so "East Syriac" would resolve up the tree to "Classical Syriac" which points to ISO "syrc".
We still need to determine exactly how to implement these in
@xml:lang
attributes. For now, we can use the ISO code when it is available and use an un-prefixed Glottocode, e.g. east2681, for other languages.We should discuss further whether or not we should serialize everything to Glottolog for the sake of accuracy and precision.