Open davidmef opened 2 years ago
I see this standard but I don't like it. I prefer my own style: 'ISO 639-3' - 'ISO 15924' - 'ISO 3166-1 alpha-2'
The script is before the country/region, because script changes are less frequent than country/region changes.
For example, eng-Latn-US means English in Latin alphabet in the United States of America.
Its advantage is it doesn't presume anything. Kazakhstan is shifting from Cyrillic to Latin now, so according to IETF, Kazakh in Cyrillic is 'kk' but when the shift is completed, it can cause confusion.
As a suggestion, I see in https://lexica.github.io/lexica/ that you use identifiers for languages that are inspired by IETF language tags https://en.wikipedia.org/wiki/IETF_language_tag https://tools.ietf.org/rfc/bcp/bcp47.txt but not quite. You should use private extension tags when there is no official code for the intended language variant and - instead of _. So you should use something like [ "ca", "de-DE", "de-DE-x-nodiacri", "en-GB", "en-US", "es", "es-x-soloenne", "fa", "fr-FR", "fr-FR-x-nodiacri", "hu", "it", "ja", "nl", "pl", "pt-BR", "pt-BR-x-nodiacri", "ru", "ru-x-extended", "uk" ] This would make interoperation with other programs and projects easier. This also makes the codes for languages to be included in the future more predictable.