nager / Nager.Country

Worldwide Country Informations (ISO-3166-1 Alpha2, ISO-3166-1 Alpha3, ISO 639-1)
MIT License
105 stars 21 forks source link

Adding three-letter ISO code for languages #17

Open CasperWSchmidt opened 1 year ago

CasperWSchmidt commented 1 year ago

Hi there We currently do some validation of language codes in our system. The validation is done based on a regex ^[a-z]{3}$ but I would like to tighten the validation to actual ISO codes. From what I can see in this repo, only the two-letter ISO codes are part of the translations. Is it feasible to add the three-letter ISO codes as well?

Also language info is not part of the main package Nager.Country, but part of Nager.Country.Translation, but isn't it relevant to have the spoken language(s) of a country in the main package, like having currencies? Then translations can stay in a separate package to keep the size down (as noted in #2)

tinohager commented 1 year ago

I think we only need a dictionary with the mapping https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

Do you think this is the best language code ISO 639-3?

CasperWSchmidt commented 1 year ago

Well, according to https://iso639-3.sil.org/about/relationships ISO 639-3 was devised to provide a comprehensive set of identifiers for all languages for use in a wide range of applications, including linguistics, lexicography and internationalization of information systems. The page also describes the differences between 639-1, 639-2 and 639-3. So basically the two-letter ISO 639-1 standard is a subset of the three-letter standard ISO 639-3.

I believe that what is the best standard depends on what it should be used for. Is it simply a set of "overall"/"main" languages spoken in the country or is it necessary to have more fine-grained options (an example is arabic)

IngBertolini commented 1 year ago

I think that would be useful too! It would be nice to have also 3 letters code for languages just like it is for countries. It would make this library even more complete and robust!

tinohager commented 1 year ago

Can someone validate the data? Datasource: https://datahub.io/core/language-codes

using System;
using System.Collections.Generic;

public class Program {
    public static void Main() {
        var items = new Dictionary <string, string> ();
        items.Add("aa", "aar");
        items.Add("ab", "abk");
        items.Add("af", "afr");
        items.Add("ak", "aka");
        items.Add("sq", "alb");
        items.Add("am", "amh");
        items.Add("ar", "ara");
        items.Add("an", "arg");
        items.Add("hy", "arm");
        items.Add("as", "asm");
        items.Add("av", "ava");
        items.Add("ae", "ave");
        items.Add("ay", "aym");
        items.Add("az", "aze");
        items.Add("ba", "bak");
        items.Add("bm", "bam");
        items.Add("eu", "baq");
        items.Add("be", "bel");
        items.Add("bn", "ben");
        items.Add("bh", "bih");
        items.Add("bi", "bis");
        items.Add("bs", "bos");
        items.Add("br", "bre");
        items.Add("bg", "bul");
        items.Add("my", "bur");
        items.Add("ca", "cat");
        items.Add("ch", "cha");
        items.Add("ce", "che");
        items.Add("zh", "chi");
        items.Add("cu", "chu");
        items.Add("cv", "chv");
        items.Add("kw", "cor");
        items.Add("co", "cos");
        items.Add("cr", "cre");
        items.Add("cs", "cze");
        items.Add("da", "dan");
        items.Add("dv", "div");
        items.Add("nl", "dut");
        items.Add("dz", "dzo");
        items.Add("en", "eng");
        items.Add("eo", "epo");
        items.Add("et", "est");
        items.Add("ee", "ewe");
        items.Add("fo", "fao");
        items.Add("fj", "fij");
        items.Add("fi", "fin");
        items.Add("fr", "fre");
        items.Add("fy", "fry");
        items.Add("ff", "ful");
        items.Add("ka", "geo");
        items.Add("de", "ger");
        items.Add("gd", "gla");
        items.Add("ga", "gle");
        items.Add("gl", "glg");
        items.Add("gv", "glv");
        items.Add("el", "gre");
        items.Add("gn", "grn");
        items.Add("gu", "guj");
        items.Add("ht", "hat");
        items.Add("ha", "hau");
        items.Add("he", "heb");
        items.Add("hz", "her");
        items.Add("hi", "hin");
        items.Add("ho", "hmo");
        items.Add("hr", "hrv");
        items.Add("hu", "hun");
        items.Add("ig", "ibo");
        items.Add("is", "ice");
        items.Add("io", "ido");
        items.Add("ii", "iii");
        items.Add("iu", "iku");
        items.Add("ie", "ile");
        items.Add("ia", "ina");
        items.Add("id", "ind");
        items.Add("ik", "ipk");
        items.Add("it", "ita");
        items.Add("jv", "jav");
        items.Add("ja", "jpn");
        items.Add("kl", "kal");
        items.Add("kn", "kan");
        items.Add("ks", "kas");
        items.Add("kr", "kau");
        items.Add("kk", "kaz");
        items.Add("km", "khm");
        items.Add("ki", "kik");
        items.Add("rw", "kin");
        items.Add("ky", "kir");
        items.Add("kv", "kom");
        items.Add("kg", "kon");
        items.Add("ko", "kor");
        items.Add("kj", "kua");
        items.Add("ku", "kur");
        items.Add("lo", "lao");
        items.Add("la", "lat");
        items.Add("lv", "lav");
        items.Add("li", "lim");
        items.Add("ln", "lin");
        items.Add("lt", "lit");
        items.Add("lb", "ltz");
        items.Add("lu", "lub");
        items.Add("lg", "lug");
        items.Add("mk", "mac");
        items.Add("mh", "mah");
        items.Add("ml", "mal");
        items.Add("mi", "mao");
        items.Add("mr", "mar");
        items.Add("ms", "may");
        items.Add("mg", "mlg");
        items.Add("mt", "mlt");
        items.Add("mn", "mon");
        items.Add("na", "nau");
        items.Add("nv", "nav");
        items.Add("nr", "nbl");
        items.Add("nd", "nde");
        items.Add("ng", "ndo");
        items.Add("ne", "nep");
        items.Add("nn", "nno");
        items.Add("nb", "nob");
        items.Add("no", "nor");
        items.Add("ny", "nya");
        items.Add("oc", "oci");
        items.Add("oj", "oji");
        items.Add("or", "ori");
        items.Add("om", "orm");
        items.Add("os", "oss");
        items.Add("pa", "pan");
        items.Add("fa", "per");
        items.Add("pi", "pli");
        items.Add("pl", "pol");
        items.Add("pt", "por");
        items.Add("ps", "pus");
        items.Add("qu", "que");
        items.Add("rm", "roh");
        items.Add("ro", "rum");
        items.Add("rn", "run");
        items.Add("ru", "rus");
        items.Add("sg", "sag");
        items.Add("sa", "san");
        items.Add("si", "sin");
        items.Add("sk", "slo");
        items.Add("sl", "slv");
        items.Add("se", "sme");
        items.Add("sm", "smo");
        items.Add("sn", "sna");
        items.Add("sd", "snd");
        items.Add("so", "som");
        items.Add("st", "sot");
        items.Add("es", "spa");
        items.Add("sc", "srd");
        items.Add("sr", "srp");
        items.Add("ss", "ssw");
        items.Add("su", "sun");
        items.Add("sw", "swa");
        items.Add("sv", "swe");
        items.Add("ty", "tah");
        items.Add("ta", "tam");
        items.Add("tt", "tat");
        items.Add("te", "tel");
        items.Add("tg", "tgk");
        items.Add("tl", "tgl");
        items.Add("th", "tha");
        items.Add("bo", "tib");
        items.Add("ti", "tir");
        items.Add("to", "ton");
        items.Add("tn", "tsn");
        items.Add("ts", "tso");
        items.Add("tk", "tuk");
        items.Add("tr", "tur");
        items.Add("tw", "twi");
        items.Add("ug", "uig");
        items.Add("uk", "ukr");
        items.Add("ur", "urd");
        items.Add("uz", "uzb");
        items.Add("ve", "ven");
        items.Add("vi", "vie");
        items.Add("vo", "vol");
        items.Add("cy", "wel");
        items.Add("wa", "wln");
        items.Add("wo", "wol");
        items.Add("xh", "xho");
        items.Add("yi", "yid");
        items.Add("yo", "yor");
        items.Add("za", "zha");
        items.Add("zu", "zul");
    }
}
IngBertolini commented 1 year ago

Hello! I tried to validate them and they are corret, but it seems that they use 3-letters codes from the ISO 639-2 standard, which uses english-like codes, instead of the ISO 639-3, which i think is more international and standard. @CasperWSchmidt what do you think?

In addition, referring to wikipedia, the code "bh" is deprecated and no longer used (it is also present in the LanguageCode enum) .

These are the codes in the ISO 639-3 standard (without "bh")

var items = new Dictionary<string, string>();
items.Add("aa", "aar");
items.Add("ab", "abk");
items.Add("af", "afr");
items.Add("ak", "aka");
items.Add("sq", "sqi");
items.Add("am", "amh");
items.Add("ar", "ara");
items.Add("an", "arg");
items.Add("hy", "hye");
items.Add("as", "asm");
items.Add("av", "ava");
items.Add("ae", "ave");
items.Add("ay", "aym");
items.Add("az", "aze");
items.Add("ba", "bak");
items.Add("bm", "bam");
items.Add("eu", "eus");
items.Add("be", "bel");
items.Add("bn", "ben");
items.Add("bi", "bis");
items.Add("bs", "bos");
items.Add("br", "bre");
items.Add("bg", "bul");
items.Add("my", "mya");
items.Add("ca", "cat");
items.Add("ch", "cha");
items.Add("ce", "che");
items.Add("zh", "zho");
items.Add("cu", "chu");
items.Add("cv", "chv");
items.Add("kw", "cor");
items.Add("co", "cos");
items.Add("cr", "cre");
items.Add("cs", "ces");
items.Add("da", "dan");
items.Add("dv", "div");
items.Add("nl", "nld");
items.Add("dz", "dzo");
items.Add("en", "eng");
items.Add("eo", "epo");
items.Add("et", "est");
items.Add("ee", "ewe");
items.Add("fo", "fao");
items.Add("fj", "fij");
items.Add("fi", "fin");
items.Add("fr", "fra");
items.Add("fy", "fry");
items.Add("ff", "ful");
items.Add("ka", "kat");
items.Add("de", "deu");
items.Add("gd", "gla");
items.Add("ga", "gle");
items.Add("gl", "glg");
items.Add("gv", "glv");
items.Add("el", "ell");
items.Add("gn", "grn");
items.Add("gu", "guj");
items.Add("ht", "hat");
items.Add("ha", "hau");
items.Add("he", "heb");
items.Add("hz", "her");
items.Add("hi", "hin");
items.Add("ho", "hmo");
items.Add("hr", "hrv");
items.Add("hu", "hun");
items.Add("ig", "ibo");
items.Add("is", "isl");
items.Add("io", "ido");
items.Add("ii", "iii");
items.Add("iu", "iku");
items.Add("ie", "ile");
items.Add("ia", "ina");
items.Add("id", "ind");
items.Add("ik", "ipk");
items.Add("it", "ita");
items.Add("jv", "jav");
items.Add("ja", "jpn");
items.Add("kl", "kal");
items.Add("kn", "kan");
items.Add("ks", "kas");
items.Add("kr", "kau");
items.Add("kk", "kaz");
items.Add("km", "khm");
items.Add("ki", "kik");
items.Add("rw", "kin");
items.Add("ky", "kir");
items.Add("kv", "kom");
items.Add("kg", "kon");
items.Add("ko", "kor");
items.Add("kj", "kua");
items.Add("ku", "kur");
items.Add("lo", "lao");
items.Add("la", "lat");
items.Add("lv", "lav");
items.Add("li", "lim");
items.Add("ln", "lin");
items.Add("lt", "lit");
items.Add("lb", "ltz");
items.Add("lu", "lub");
items.Add("lg", "lug");
items.Add("mk", "mkd");
items.Add("mh", "mah");
items.Add("ml", "mal");
items.Add("mi", "mri");
items.Add("mr", "mar");
items.Add("ms", "msa");
items.Add("mg", "mlg");
items.Add("mt", "mlt");
items.Add("mn", "mon");
items.Add("na", "nau");
items.Add("nv", "nav");
items.Add("nr", "nbl");
items.Add("nd", "nde");
items.Add("ng", "ndo");
items.Add("ne", "nep");
items.Add("nn", "nno");
items.Add("nb", "nob");
items.Add("no", "nor");
items.Add("ny", "nya");
items.Add("oc", "oci");
items.Add("oj", "oji");
items.Add("or", "ori");
items.Add("om", "orm");
items.Add("os", "oss");
items.Add("pa", "pan");
items.Add("fa", "fas");
items.Add("pi", "pli");
items.Add("pl", "pol");
items.Add("pt", "por");
items.Add("ps", "pus");
items.Add("qu", "que");
items.Add("rm", "roh");
items.Add("ro", "ron");
items.Add("rn", "run");
items.Add("ru", "rus");
items.Add("sg", "sag");
items.Add("sa", "san");
items.Add("si", "sin");
items.Add("sk", "slk");
items.Add("sl", "slv");
items.Add("se", "sme");
items.Add("sm", "smo");
items.Add("sn", "sna");
items.Add("sd", "snd");
items.Add("so", "som");
items.Add("st", "sot");
items.Add("es", "spa");
items.Add("sc", "srd");
items.Add("sr", "srp");
items.Add("ss", "ssw");
items.Add("su", "sun");
items.Add("sw", "swa");
items.Add("sv", "swe");
items.Add("ty", "tah");
items.Add("ta", "tam");
items.Add("tt", "tat");
items.Add("te", "tel");
items.Add("tg", "tgk");
items.Add("tl", "tgl");
items.Add("th", "tha");
items.Add("bo", "bod");
items.Add("ti", "tir");
items.Add("to", "ton");
items.Add("tn", "tsn");
items.Add("ts", "tso");
items.Add("tk", "tuk");
items.Add("tr", "tur");
items.Add("tw", "twi");
items.Add("ug", "uig");
items.Add("uk", "ukr");
items.Add("ur", "urd");
items.Add("uz", "uzb");
items.Add("ve", "ven");
items.Add("vi", "vie");
items.Add("vo", "vol");
items.Add("cy", "cym");
items.Add("wa", "wln");
items.Add("wo", "wol");
items.Add("xh", "xho");
items.Add("yi", "yid");
items.Add("yo", "yor");
items.Add("za", "zha");
items.Add("zu", "zul");
CasperWSchmidt commented 1 year ago

IMHO the ISO 639-3 standard might as well be used from the beginning if the three-letter codes are added. This will require the opposite relation between two- and three-letter codes though as multiple ISO 639-3 codes maps to the same ISO 639-1 code. Hence the ISO 639-3 codes must be the keys of the dictionary :)

IngBertolini commented 1 year ago

So we need this kind of mapping, where one of the ISO 693-1 languages can have multiple local languages. (site for reference)

Do you think that the library should also mangage every single local language or is it enough if the the mapping returns simply the macrolanguage? Example:

ILanguageTranslation language = new TranslationProvider().GetLanguage("aeb");

should return an instance of TunisianArabicLanguageTranslation of is it sufficient that it returns an instance of ArabicLanguageTranslation ?

I think that the second alternative should be fine!

CasperWSchmidt commented 1 year ago

I'm not really into the translation stuff, all I care about are the language codes for each country :) But I believe the answer to your question depends on the differences in each "local" language compared to the macro language (fx. Portuguese and Spanish are spoken in both Europe and South America so differences can be significant)

tinohager commented 1 year ago

Hi, does anyone want to make a suggestion for implementation otherwise I will close the issue?

CasperWSchmidt commented 1 year ago

I would love to but I'm afraid I have other tasks at hand with hard deadlines ATM :( If you keep it open I might be able to take a stab at it in a few months though...