keymanapp / keyboards

Open Source Keyman keyboards
147 stars 265 forks source link

chore: review changes to keyboard_info coming with v17 compiler #2311

Closed mcdurdin closed 1 year ago

mcdurdin commented 1 year ago

The v17.0 compiler has two minor differences in how it builds .keyboard_info files.


Platform support differences

The old compiler could not detect if a given keyboard was web or mobile -- so it erred on the side of listing both. The new compiler uses the &targets store consistently (for any keyboards that have a .kmx output). This means that a number of keyboards will no longer be listed as supporting mobileWeb, or android/ios/mobileWeb, or in a few cases, desktopWeb.

We need to verify that these changes are not going to cause trouble by making keyboards unavailable where they should be available.

Keyboard ID Unexpected Platforms
athinkra_vai mobileWeb
athinkra_vai_typewriter mobileWeb
basic_kbdcherp android, ios, mobileWeb
basic_kbdlt2 mobileWeb
basic_kbdsw09 mobileWeb
bj_naskapi_classic mobileWeb
coptic_greek android, ios, mobileWeb
engram mobileWeb
esperuni mobileWeb
gandhari mobileWeb
gff_geez mobileWeb
gff_harege_fidelat desktopWeb
gff_mesobe_fidelat desktopWeb
itrans_bengali android, ios, mobileWeb
itrans_devanagari_sanskrit_vedic android, ios, mobileWeb
itrans_gujarati android, ios, mobileWeb
korean_rr mobileWeb
lao_2008_basic android, ios, mobileWeb
maltese mobileWeb
mozhi_malayalam android, ios, mobileWeb
mro_phonetic mobileWeb
myancode_san android, ios, mobileWeb
nlci_bengali_winscript android, ios, mobileWeb
nlci_gujarati_winscript android, ios, mobileWeb
nlci_malayalam_winscript android, ios, mobileWeb
nlci_tamil_winscript android, ios, mobileWeb
nlci_telugu_winscript android, ios, mobileWeb
nobonob mobileWeb
sabdalipi_assamese android, ios, mobileWeb
saraiki mobileWeb
sil_cipher_music mobileWeb
sil_dzongkha mobileWeb
sil_indic_roman mobileWeb
sil_khowar mobileWeb
sil_lepcha mobileWeb
sil_limbu_phonetic android, ios, mobileWeb
sil_limbu_typewriter android, ios, mobileWeb
sil_myanmar_mywinext mobileWeb
sil_tai_dam mobileWeb
sil_tai_dam_lao mobileWeb
sil_tai_dam_latin mobileWeb
sil_tai_dam_typewriter android, ios, mobileWeb

Language name differences

The old compiler used language-subtag-registry to determine language, script and region names. The new compiler makes use of langtags.json for language names, and Intl.DisplayNames for script and region names.

We should review the list of language names to ensure there are no problematic changes. Note that this data only affects the .keyboard_info files, not packages.

Language Keyboard ID Old name New Name
aae sil_euro_latin Arbëreshë Albanian Albanian, Arbëreshë
abc-Latn sil_philippines Ambala Ayta (Latin) Ayta, Ambala (Latin)
abe-Latn fv_wobanakiodwawogan Western Abnaki (Latin) Abenaki, Western (Latin)
add sil_cameroon_azerty Dzodinka Lidzonka
add sil_cameroon_qwerty Dzodinka Lidzonka
aeb-Arab sil_tunisian Tunisian Arabic (Arabic) Arabic, Tunisian Spoken (Arabic)
agu chalchiteko Aguacateco Awakateko
aln-Brai malar_braille Gheg Albanian (Braille) Albanian, Gheg (Braille)
alu-Latn sil_areare Are'are (Latin) ’Are’are (Latin)
arb-Arab rac_arabic Standard Arabic (Arabic) Arabic, Standard (Arabic)
arq-Arab sil_arabic_phonetic Algerian Arabic (Arabic) Arabic, Algerian Spoken (Arabic)
atb-Lisu basic_kbdlisub Zaiwa (Lisu) Zaiwa (Fraser)
atb-Lisu basic_kbdlisus Zaiwa (Lisu) Zaiwa (Fraser)
ay sil_bolivia Aymara Aymara, Central
azj-Cyrl basic_kbdaze North Azerbaijani (Cyrillic) Azerbaijani, North (Cyrillic)
azj-Latn basic_kbdazst North Azerbaijani (Latin) Azerbaijani, North (Latin)
azj-Latn-AZ basic_kbdazel North Azerbaijani (Latin, Azerbaijan) Azerbaijani, North (Latin, Azerbaijan)
ba-Cyrl basic_kbdbash Bashkir (Cyrillic) Bashkort (Cyrillic)
bal balochi_inpage Baluchi Balochi, Southern
bal balochi_persian Baluchi Balochi, Southern
bal balochi_phonetic Baluchi Balochi, Southern
bal balochi_urdu Baluchi Balochi, Southern
bal-Latn balochi_latin Baluchi (Latin) Balochi, Southern (Latin)
bal-Latn balochi_scientific Baluchi (Latin) Balochi, Southern (Latin)
ban-Bali aksarabali_panlex Balinese (Balinese) Bali (Balinese)
bgp-Arab multi_pak_phonetic Eastern Balochi (Arabic) Balochi, Eastern (Arabic)
bh-Deva itrans_devanagari_hindi Bihari languages (Devanagari) Bhojpuri (Devanagari)
bin nailangs Bini Edo
bin-Latn el_naija Bini (Latin) Edo (Latin)
bin-Latn sil_pan_africa_mnemonic Bini (Latin) Edo (Latin)
bin-Latn sil_pan_africa_positional Bini (Latin) Edo (Latin)
bla-Latn fv_blackfoot Siksika (Latin) Blackfoot (Latin)
bm-Latn clavbur9 Bambara (Latin) Bamanankan (Latin)
bm-Latn sil_mali_azerty Bambara (Latin) Bamanankan (Latin)
bm-Latn sil_mali_qwerty Bambara (Latin) Bamanankan (Latin)
bm-Latn sil_mali_qwertz Bambara (Latin) Bamanankan (Latin)
bm-Nkoo basic_kbdnko Bambara (N’Ko) Bamanankan (N’Ko)
bo-Tibt-CN basic_kbdtiprd Tibetan (Tibetan, China) Tibetan, Central (Tibetan, China)
bqc-Latn sil_busa Boko (Benin) (Latin) Boko (Latin)
bqp-Latn sil_nigeria_odd_vowels Busa (Latin) Bisã (Latin)
brb-Khmr sil_brao Lave (Khmer) Brao (Khmer)
bru-Latn sil_bru Eastern Bru (Latin) Bru, Eastern (Latin)
bsc sil_senegal_bsc_azerty Bassari Oniyan
btk-Batk batak Batak languages (Batak) btk-Batk (Batak)
bug-Bugi basic_kbdbug Buginese (Buginese) Bugis (Buginese)
bwe-Latn sil_bwe_karen Bwe Karen (Latin) Karen, Bwe (Latin)
bwo-Latn sil_el_ethiopian_latin Boro (Ethiopia) (Latin) Borna (Latin)
byn-Ethi gff_blin Bilin (Ethiopic) Bilen (Ethiopic)
byn-Ethi gff_ethiopic Bilin (Ethiopic) Bilen (Ethiopic)
bzw-Latn sil_nigeria_dot Basa (Nigeria) (Latin) Basa (Latin)
caf-Cans fv_southern_carrier Southern Carrier (Unified Canadian Aboriginal Syllabics) Carrier, Southern (Unified Canadian Aboriginal Syllabics)
caf-Latn fv_dakelh Southern Carrier (Latin) Carrier, Southern (Latin)
caf-Latn fv_natwits Southern Carrier (Latin) Carrier, Southern (Latin)
chn chinuk_wawa Chinook jargon Chinook Wawa
chp-Cans fv_dene_mb Chipewyan (Unified Canadian Aboriginal Syllabics) Dene (Unified Canadian Aboriginal Syllabics)
chp-Cans fv_dene_nt Chipewyan (Unified Canadian Aboriginal Syllabics) Dene (Unified Canadian Aboriginal Syllabics)
chp-Latn dene Chipewyan (Latin) Dene (Latin)
chp-Latn fv_denesuline Chipewyan (Latin) Dene (Latin)
chp-Latn fv_denesuline_epsilon Chipewyan (Latin) Dene (Latin)
ckb-Arab basic_kbdkurd Central Kurdish (Arabic) Kurdish, Central (Arabic)
cmo-Khmr sil_bunong Central Mnong (Khmer) Mnong, Central (Khmer)
cmo-Latn dega Central Mnong (Latin) Mnong, Central (Latin)
cr bj_cree_woods Cree Cree, Woods
cr-Latn fv_cree_latin Cree (Latin) Cree, Woods (Latin)
crj bj_cree_east_james_bay Southern East Cree Cree, Southern East
crk-Cans fv_plains_cree Plains Cree (Unified Canadian Aboriginal Syllabics) Cree, Plains (Unified Canadian Aboriginal Syllabics)
crk-Cans nrc_crk_cans Plains Cree (Unified Canadian Aboriginal Syllabics) Cree, Plains (Unified Canadian Aboriginal Syllabics)
crk-Latn bj_cree_west_latn Plains Cree (Latin) Cree, Plains (Latin)
crl bj_cree_east Northern East Cree Cree, Northern East
crl-Cans fv_northern_east_cree Northern East Cree (Unified Canadian Aboriginal Syllabics) Cree, Northern East (Unified Canadian Aboriginal Syllabics)
crl-Latn bj_cree_east_latn Northern East Cree (Latin) Cree, Northern East (Latin)
crm-Cans fv_moose_cree Moose Cree (Unified Canadian Aboriginal Syllabics) Cree, Moose (Unified Canadian Aboriginal Syllabics)
csw bj_mista_wasaha_cree Swampy Cree Cree, Swampy
csw-Cans fv_swampy_cree Swampy Cree (Unified Canadian Aboriginal Syllabics) Cree, Swampy (Unified Canadian Aboriginal Syllabics)
de basic_kbdgr German German, Standard
de basic_kbdgr1 German German, Standard
de basic_kbdsg German German, Standard
de bu_phonetic German German, Standard
de-Runr basic_kbdfthrk German (Runic) German, Standard (Runic)
de-Runr runeboard German (Runic) German, Standard (Runic)
dgo-Arab-PK rac_dogri Dogri (individual language) (Arabic, Pakistan) Dogri (Arabic, Pakistan)
dgr-Latn fv_tlicho_yatii Dogrib (Latin) Tlicho (Latin)
din-Latn el_dinka Dinka (Latin) Dinka, Southwestern (Latin)
doi-Dogr dogra_inscript Dogri (macrolanguage) (Dogra) Dogri (Dogra)
dv basic_kbddiv1 Dhivehi Maldivian
dv basic_kbddiv2 Dhivehi Maldivian
ee-Latn ghana Ewe (Latin) Éwé (Latin)
el basic_kbdhe Modern Greek (1453-) Greek
el basic_kbdhe220 Modern Greek (1453-) Greek
el basic_kbdhe319 Modern Greek (1453-) Greek
el basic_kbdhept Modern Greek (1453-) Greek
el greekclassical Modern Greek (1453-) Greek
el-Latn basic_kbdgkl Modern Greek (1453-) (Latin) Greek (Latin)
el-Latn basic_kbdhela2 Modern Greek (1453-) (Latin) Greek (Latin)
el-Latn basic_kbdhela3 Modern Greek (1453-) (Latin) Greek (Latin)
el-Latn sil_hebr_grek_trans Modern Greek (1453-) (Latin) Greek (Latin)
emp-Latn embera_north Northern Emberá (Latin) Emberá, Northern (Latin)
esg-Deva gondi_dev Aheri Gondi (Devanagari) Gondi, Aheri (Devanagari)
esi indigenous_nt North Alaskan Inupiatun Inupiatun, North Alaskan
ess-Cyrl sil_yupik_cyrillic Central Siberian Yupik (Cyrillic) Yupik, Saint Lawrence Island (Cyrillic)
ess-Cyrl sil_yupik_cyrillic_ru Central Siberian Yupik (Cyrillic) Yupik, Saint Lawrence Island (Cyrillic)
et basic_kbdest Estonian Estonian, Standard
fa basic_kbdfa Persian Persian, Iranian
fa basic_kbdfar Persian Persian, Iranian
fa farsiman Persian Persian, Iranian
fub-Arab fulfulde_ajami_qwerty Adamawa Fulfulde (Arabic) Fulfulde, Adamawa (Arabic)
fub-Latn fulfulde_latin_qwerty Adamawa Fulfulde (Latin) Fulfulde, Adamawa (Latin)
gbo-Latn libtralo Northern Grebo (Latin) Grebo, Northern (Latin)
gn basic_kbdgn Guarani Guaraní, Paraguayan
gon-Gonm masaram_gondi Gondi (Masaram Gondi) Gondi, Northern (Masaram Gondi)
grc-Grek galaxie_greek_hebrew_mnemonic Ancient Greek (to 1453) (Greek) Greek, Ancient (Greek)
grc-Grek galaxie_greek_mnemonic Ancient Greek (to 1453) (Greek) Greek, Ancient (Greek)
grc-Grek galaxie_greek_positional Ancient Greek (to 1453) (Greek) Greek, Ancient (Greek)
grc-Grek sil_greek_polytonic Ancient Greek (to 1453) (Greek) Greek, Ancient (Greek)
gwc-Arab rac_gawri Kalami (Arabic) Gawri (Arabic)
gwi-Latn fv_gwichin Gwichʼin (Latin) Gwich’in (Latin)
hax-Latn fv_hlgaagilda_xaayda_kil Southern Haida (Latin) Haida, Southern (Latin)
hbo-Hebr galaxie_hebrew_mnemonic Ancient Hebrew (Hebrew) Hebrew, Ancient (Hebrew)
hbo-Hebr galaxie_hebrew_positional Ancient Hebrew (Hebrew) Hebrew, Ancient (Hebrew)
hbo-Hebr sil_hebrew Ancient Hebrew (Hebrew) Hebrew, Ancient (Hebrew)
hbo-Hebr sil_hebrew_legacy Ancient Hebrew (Hebrew) Hebrew, Ancient (Hebrew)
hmd-Plrd sil_hmd_plrd Large Flowery Miao (Miao) Miao, Large Flowery (Pollard Phonetic)
hnd-Arab rac_hindko Southern Hindko (Arabic) Hindko, Southern (Arabic)
hno-Arab basic_kbdurdu Northern Hindko (Arabic) Hindko, Northern (Arabic)
hsb basic_kbdsorex Upper Sorbian Sorbian, Upper
hsb basic_kbdsors1 Upper Sorbian Sorbian, Upper
ii sil_yi Sichuan Yi Nuosu
ike-Cans fv_eastern_canadian_inuktitut Eastern Canadian Inuktitut (Unified Canadian Aboriginal Syllabics) Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
ike-Cans inuktitut_naqittaut Eastern Canadian Inuktitut (Unified Canadian Aboriginal Syllabics) Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
ike-Cans inuktitut_pirurvik Eastern Canadian Inuktitut (Unified Canadian Aboriginal Syllabics) Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
ike-Latn inuktitut_latin Eastern Canadian Inuktitut (Latin) Inuktitut, Eastern Canadian (Latin)
iri-Latn sil_nigeria_underline Irigwe (Latin) Rigwe (Latin)
itc-Ital basic_kbdoldit Italic languages (Old Italic (Etruscan, Oscan, etc.)) itc-Ital (Old Italic)
iu-Cans basic_kbdinuk2 Inuktitut (Unified Canadian Aboriginal Syllabics) Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
iu-Cans basic_kbdiulat Inuktitut (Unified Canadian Aboriginal Syllabics) Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
jmn-Latn sil_makuri Makuri Naga (Latin) Naga, Makuri (Latin)
kaa karakalpak_cyrillic Kara-Kalpak Karakalpak
kaa-Latn karakalpak_latin Kara-Kalpak (Latin) Karakalpak (Latin)
kfi-Knda nlci_kannada_winscript Kannada Kurumba (Kannada) Kurumba, Kannada (Kannada)
khk-Cyrl basic_kbdmon Halh Mongolian (Cyrillic) Mongolian, Halh (Cyrillic)
khk-Cyrl-MN mongolian_cyrillic_qwerty Halh Mongolian (Cyrillic, Mongolia) Mongolian, Halh (Cyrillic, Mongolia)
khk-Mong basic_kbdmonmo Halh Mongolian (Mongolian) Mongolian, Halh (Mongolian)
khk-Mong basic_kbdmonst Halh Mongolian (Mongolian) Mongolian, Halh (Mongolian)
khk-Phag-MN basic_kbdphags Halh Mongolian (Phags-pa, Mongolia) Mongolian, Halh (Phags-pa, Mongolia)
kl basic_kbdgrlnd Kalaallisut Greenlandic
km basic_kbdkhmr Central Khmer Khmer
km basic_kbdkni Central Khmer Khmer
km khmer_advanced Central Khmer Khmer
km khmer_angkor Central Khmer Khmer
km sil_khmer Central Khmer Khmer
kmr-Arab behdini_arab Northern Kurdish (Arabic) Kurdish, Northern (Arabic)
kmr-Arab sorani_behdini_arab_qwerty Northern Kurdish (Arabic) Kurdish, Northern (Arabic)
ksw-Mymr sil_sgaw_karen S'gaw Karen (Myanmar) Karen, S’gaw (Myanmar)
kvx-Arab rac_parkari_koli Parkari Koli (Arabic) Koli, Parkari (Arabic)
kwk-Latn fv_kwakwala Kwakiutl (Latin) Kwakwala (Latin)
kwk-Latn fv_kwakwala_liqwala Kwakiutl (Latin) Kwakwala (Latin)
kxp-Arab rac_wadiyara Wadiyara Koli (Arabic) Koli, Wadiyari (Arabic)
ky-Cyrl basic_kbdkyr Kirghiz (Cyrillic) Kyrgyz (Cyrillic)
kyu sil_kayah_kali Western Kayah Kayah, Western
kyu-Latn sil_kayah_latn Western Kayah (Latin) Kayah, Western (Latin)
kyu-Mymr sil_kayah_mymr Western Kayah (Myanmar) Kayah, Western (Myanmar)
lis-Lisu sil_lisu_basic Lisu (Lisu) Lisu (Fraser)
lis-Lisu sil_lisu_standard Lisu (Lisu) Lisu (Fraser)
lpo-Plrd sil_lpo_plrd Lipo (Miao) Lipo (Pollard Phonetic)
lv basic_kbdlv Latvian Latvian, Standard
lv basic_kbdlv1 Latvian Latvian, Standard
lv basic_kbdlvst Latvian Latvian, Standard
mad-Java jawa Madurese (Javanese) Madura (Javanese)
mhi sil_madi Ma'di Ma’di
mic-Latn fv_migmaq Mi'kmaq (Latin) Mi’kmaq (Latin)
mid-Mand mandaic_phonetic Mandaic (Mandaic) Mandaic (Mandaean)
mmo-Latn sil_buang Mangga Buang (Latin) Buang, Mangga (Latin)
mni-Mtei meitei_legacy Manipuri (Meitei Mayek) Meitei (Meitei Mayek)
moe-Latn bj_innu Montagnais (Latin) Innu (Latin)
moe-Latn bj_innu_phonemic Montagnais (Latin) Innu (Latin)
moe-Latn fv_ilnu_innu_aimun Montagnais (Latin) Innu (Latin)
mos-Latn sil_moore Mossi (Latin) Mòoré (Latin)
ms indonesian_suku Malay (macrolanguage) Malay, Standard
mve-Arab rac_marwari Marwari (Pakistan) (Arabic) Marwari (Arabic)
mvy-Arab rac_indus_kohistani Indus Kohistani (Arabic) Kohistani, Indus (Arabic)
mym-Latn me_en Me'en (Latin) Me’en (Latin)
ncg-Latn fv_nisgaa Nisga'a (Latin) Nisga’a (Latin)
ne basic_kbdnepr Nepali (macrolanguage) Nepali
ne nepali_traditional Nepali (macrolanguage) Nepali
ne sil_devanagari_romanized Nepali (macrolanguage) Nepali
ne sil_devanagari_typewriter Nepali (macrolanguage) Nepali
new-Newa newa_romanized Newari (Newa) Newar (Newa)
new-Newa newa_traditional Newari (Newa) Newar (Newa)
nod-Lana sil_boonkit Northern Thai (Tai Tham) Thai, Northern (Lanna)
nqo nko N'Ko N’ko
nqo sil_nko N'Ko N’ko
nsq northern_sierra_miwok Northern Sierra Miwok Miwok, Northern Sierra
odk-Arab rac_oadki Od (Arabic) Oadki (Arabic)
oj-Cans fv_anishinaabemowin Ojibwa (Unified Canadian Aboriginal Syllabics) Ojibwa, Eastern (Unified Canadian Aboriginal Syllabics)
ojb-Cans fv_ojibwa Northwestern Ojibwa (Unified Canadian Aboriginal Syllabics) Ojibwa, Northwestern (Unified Canadian Aboriginal Syllabics)
ojs bj_oji_cree Severn Ojibwa Oji-Cree
ojs-Cans fv_severn_ojibwa Severn Ojibwa (Unified Canadian Aboriginal Syllabics) Oji-Cree (Unified Canadian Aboriginal Syllabics)
or basic_kbdinori Oriya (macrolanguage) Odia
or itrans_odia Oriya (macrolanguage) Odia
or nlci_oriya_winscript Oriya (macrolanguage) Odia
otk-Orkh old_turkic_udw21_qwerty Old Turkish (Old Turkic) Old Turkish (Orkhon)
pa basic_kbdinpun Panjabi Punjabi, Eastern
pa itrans_gurmukhi Panjabi Punjabi, Eastern
pa nlci_gurmukhi_winscript Panjabi Punjabi, Eastern
pes-Arab persian_phonetic Iranian Persian (Arabic) Persian, Iranian (Arabic)
phl-Arab rac_palula Phalura (Arabic) Palula (Arabic)
pkb-Latn btl_kenya Pokomo (Latin) Kipfokomo (Latin)
pnb sil_extended_urdu_np Western Panjabi Punjabi, Western
pnb-Arab rac_western_punjabi Western Panjabi (Arabic) Punjabi, Western (Arabic)
pnb-Arab sanjha_punjabi Western Panjabi (Arabic) Punjabi, Western (Arabic)
pnb-Arab shahmukhi_phonetic Western Panjabi (Arabic) Punjabi, Western (Arabic)
png naijatype Pongu Pangu
ps basic_kbdpash Pushto Pashto, Northern
ps rac_pashto Pushto Pashto, Northern
psi-Arab rac_pashai Southeast Pashai (Arabic) Pashai, Southeast (Arabic)
rar-Latn cim Rarotongan (Latin) Cook Islands Maori (Latin)
rar-Latn el_pasifika Rarotongan (Latin) Cook Islands Maori (Latin)
rit-Latn el_yolngu Ritarungo (Latin) Ritharrngu (Latin)
rwm-Latn sil_eastern_congo Amba (Uganda) (Latin) Amba (Latin)
sah-Cyrl-RU basic_kbdyak Yakut (Cyrillic, Russian Federation) Yakut (Cyrillic, Russia)
sat-Latn santali_latin Santali (Latin) Santhali (Latin)
sat-Olck basic_kbdolch Santali (Ol Chiki) Santhali (Ol Chiki)
scs-Latn fv_kashogotine_yati North Slavey (Latin) Slavey, North (Latin)
scs-Latn fv_sahugotine_yati North Slavey (Latin) Slavey, North (Latin)
scs-Latn fv_shihgotine_yati North Slavey (Latin) Slavey, North (Latin)
se-Latn basic_kbdfi1 Northern Sami (Latin) Saami, North (Latin)
se-Latn basic_kbdno1 Northern Sami (Latin) Saami, North (Latin)
se-Latn basic_kbdsmsfi Northern Sami (Latin) Saami, North (Latin)
se-Latn basic_kbdsmsno Northern Sami (Latin) Saami, North (Latin)
shu-Latn sil_tchad Chadian Arabic (Latin) Arabic, Chadian Spoken (Latin)
sl basic_kbdcr Slovenian Slovene
sq basic_kbdal Albanian Albanian, Tosk
srr sil_senegal_srr_azerty Serer Serer-Sine
srr-Arab srr_ajami_qwerty Serer (Arabic) Serer-Sine (Arabic)
stp-Latn sil_tepehuan Southeastern Tepehuan (Latin) Tepehuan, Southeastern (Latin)
str fv_sencoten Straits Salish Salish, Straits
su-Sund sundanese Sundanese (Sundanese) Sunda (Sundanese)
sw sil_uganda_tanzania Swahili (macrolanguage) Swahili
syc-Syrc basic_kbdsyr1 Classical Syriac (Syriac) Syriac (Syriac)
syc-Syrc basic_kbdsyr2 Classical Syriac (Syriac) Syriac (Syriac)
syl-Beng sil_bengali_phonetic Sylheti (Bengali) Sylheti (Bangla)
syr-Syrc aramaic_hebrew Syriac (Syriac) Chaldean Neo-Aramaic (Syriac)
tau-Latn fv_neeaaneegn Upper Tanana (Latin) Tanana, Upper (Latin)
tce-Latn fv_southern_tutchone Southern Tutchone (Latin) Tutchone, Southern (Latin)
ti geezbrhan Tigrinya Tigrigna
ti-ER gff_tigrinya_eritrea Tigrinya (Eritrea) Tigrigna (Eritrea)
ti-ET gff_tigrinya_ethiopia Tigrinya (Ethiopia) Tigrigna (Ethiopia)
tig gff_tigre Tigre Tigré
tig-Ethi sil_ethiopic Tigre (Ethiopic) Tigré (Ethiopic)
tig-Ethi sil_ethiopic_power_g Tigre (Ethiopic) Tigré (Ethiopic)
tl-Buhd buhid Tagalog (Buhid) Filipino (Buhid)
tl-Hano hanunoo Tagalog (Hanunoo) Filipino (Hanunoo)
tn basic_kbdnso Tswana Setswana
ttm-Latn fv_northern_tutchone Northern Tutchone (Latin) Tutchone, Northern (Latin)
ttq-Tfng sil_tawallammat Tawallammat Tamajaq (Tifinagh) Tamajaq, Tawallammat (Tifinagh)
tzm basic_kbdtzm Central Atlas Tamazight Tamazight, Central Atlas
tzm-Tfng basic_kbdtifi2 Central Atlas Tamazight (Tifinagh) Tamazight, Central Atlas (Tifinagh)
tzm-Tfng-MA basic_kbdtifi Central Atlas Tamazight (Tifinagh, Morocco) Tamazight, Central Atlas (Tifinagh, Morocco)
ug-Arab basic_kbdughr Uighur (Arabic) Uyghur (Arabic)
ug-Arab basic_kbdughr1 Uighur (Arabic) Uyghur (Arabic)
ug-Arab rac_uyghur Uighur (Arabic) Uyghur (Arabic)
uzn-Cyrl basic_kbduzb Northern Uzbek (Cyrillic) Uzbek, Northern (Cyrillic)
wsg-Gong gondi_gunjala Adilabad Gondi (Gunjala Gondi) Gondi, Adilabad (Gunjala Gondi)
wsg-Telu gondi_tel Adilabad Gondi (Telugu) Gondi, Adilabad (Telugu)
xmf-Geok colchis_phonetic Mingrelian (Khutsuri (Asomtavruli and Nuskhuri)) Mingrelian (Georgian Khutsuri)
xnz-Copt sil_nubian Kenzi (Coptic) Mattokki (Coptic)
xsl-Latn fv_dene_zhatie South Slavey (Latin) Slavey, South (Latin)
ydg-Arab rac_yidgha Yidgha (Arabic) Yadgha (Arabic)
ygp-Plrd sil_ygp_plrd Gepo (Miao) Gepo (Pollard Phonetic)
yna-Plrd sil_yna_plrd Aluo (Miao) Aluo (Pollard Phonetic)
ywq-Plrd sil_ywq_plrd Wuding-Luquan Yi (Miao) Yi, Wuding-Luquan (Pollard Phonetic)
zlm-Latn basic_kbdus Malay (individual language) (Latin) Malay (Latin)
LornaSIL commented 1 year ago

@mcdurdin how do we know if a name is problematical? The newest langtags weeds out pejorative names. The new names use commas and the old names don't. Is that a problem?

LornaSIL commented 1 year ago

@mcdurdin

So what is the issue? Is it just that it's in the wrong order (should be desktop web)?

I looked at one other keyboard and it also had store(&TARGETS) 'web desktop'

If that is throwing it off, we should be able to just update the .kmn and not doing any version changes, correct?

mcdurdin commented 1 year ago

lang tags

how do we know if a name is problematical? The newest langtags weeds out pejorative names. The new names use commas and the old names don't. Is that a problem?

All super good questions @LornaSIL :grin: At this point, I think a quick sanity check is sufficient.

targets

  • Targets for athinkra_vai is store(&TARGETS) 'web desktop'
  • There is no .js file in the .kps
  • keyboard_info does not have anything about targets.
  • I downloaded the .kmp file and there is no .js file in the .kmp

So what is the issue? Is it just that it's in the wrong order (should be desktop web)?

Okay, perhaps my table was unclear. The 'unexpected platforms' column shows places where the old compiler was giving us targets such as mobileWeb which we probably don't want. The new compiler is giving us better data overall, and so the 'quick sanity check' here is probably just a scan down the column from your perspective to see if anything stands out as obviously wrong. I saw nothing wrong when I checked, so this table is as much for documentation of the change as anything.

LornaSIL commented 1 year ago

The names look fine. I looked at the targets and they all seemed correct, but I did a PR to tidy up all the targets statements to the minimal statement. No change to version numbers.

DavidLRowe commented 1 year ago

@mcdurdin Minor FYI re: (We are on 1.3.1, which is latest published version AFAICT)

The record has:

"api": "1.3.1",
"date": "2023-05-02",
"tag": "_version"

So the 1.3.1 is the version of the API and (hopefully) won't change too often. The date is from the last release. We hope to make another release tomorrow.

mcdurdin commented 1 year ago

Ah, gotcha! We are currently on 2023-05-04 and don't plan to update to the next version until the next major release now:

https://github.com/keymanapp/keyman/blob/master/resources/standards-data/langtags/langtags.json#L15-L19 currently shows:

    {
        "api": "1.3.1",
        "date": "2023-05-04",
        "tag": "_version"
    },
andjc commented 12 months ago

I am currently doing a rewrite of the Dinka keyboard, and noticed this issue, I take it that you are moving from a BCP47 definition to a CLDR definition of the language subtags? If so can we use the -x- extension in language tags?

On Fri, 4 Aug 2023, 09:03 Marc Durdin, @.***> wrote:

Ah, gotcha! We are currently on 2023-05-04 and don't plan to update to the next version until the next major release now:

https://github.com/keymanapp/keyman/blob/master/resources/standards-data/langtags/langtags.json#L15-L19 currently shows:

{
    "api": "1.3.1",
    "date": "2023-05-04",
    "tag": "_version"
},

— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2311#issuecomment-1664742606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGM67FK7QIV5CIR5KMR4LXTQU2NANCNFSM6AAAAAA2WRAVRI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mcdurdin commented 12 months ago

I take it that you are moving from a BCP47 definition to a CLDR definition of the language subtags? If so can we use the -x- extension in language tags?

Not quite. First, we won't support -x- extensions until 18.0.

Second, our restriction was the lang-script-region subtag triplet because of various operating systems that didn't support more expressive tags. We are moving towards defining the best subtag for the keyboard, and gracefully degrading the subtag for those OSes that don't support arbitrary tags. It wasn't really a BCP47 vs CLDR thing.

andjc commented 12 months ago

Given this line in above message:

din-Latn el_dinka Dinka (Latin) Dinka, Southwestern (Latin)

What would the language name for din-Latn resolve to?

Since i will need to distinguish between Dinka (Latin) and Dinka, Southwestern (Latin)

On Wed, 20 Sept 2023, 11:17 Marc Durdin, @.***> wrote:

I take it that you are moving from a BCP47 definition to a CLDR definition of the language subtags? If so can we use the -x- extension in language tags?

Not quite. First, we won't support -x- extensions until 18.0.

Second, our restriction was the lang-script-region subtag triplet because of various operating systems that didn't support more expressive tags. We are moving towards defining the best subtag for the keyboard, and gracefully degrading the subtag for those OSes that don't support arbitrary tags. It wasn't really a BCP47 vs CLDR thing.

— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2311#issuecomment-1726742537, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGM64WJIG374H7ZFLAAGDX3I72HANCNFSM6AAAAAA2WRAVRI . You are receiving this because you commented.Message ID: @.***>

mcdurdin commented 12 months ago

What would the language name for din-Latn resolve to? Since i will need to distinguish between Dinka (Latin) and Dinka, Southwestern (Latin)

I'm curious: what is the distinction that you need to work with? From langtags.json, din-Latn resolves to 'Dinka (Latin)'. Furthermore, the minimal tag is din (only Windows needs -Latn suffix).

    {
        "full": "din-Latn-SS",
        "iana": [ "Dinka" ],
        "iso639_3": "din",
        "localname": "Thuɔŋjäŋ",
        "localnames": [ "Thuɔŋjäŋ" ],
        "name": "Dinka",
        "names": [ "Dinka, Southwestern", "Thoŋ ë Muɔnyjäŋ", "Thuɔŋjäŋ", "Western Dinka" ],
        "region": "SS",
        "regionname": "South Sudan",
        "script": "Latn",
        "sldr": true,
        "tag": "din",
        "tags": [ "dik", "dik-Latn", "dik-Latn-SS", "dik-SS", "din-Latn", "din-SS" ],
        "windows": "din-Latn"
    }
andjc commented 12 months ago

Ahhh, it is using the CLDR definition.

In BCP-47 din is a macrolanguage

In CLDR din is equated with dik, with din as preferred form.

What I will need to do is distinguish between the unified orthogrpahy and existing dialects, esp when it will come to the lexical models. So din would cover an orthography and grammar that is cross dialectical, and the individual language codes including dik would represent the existing dialect specific approaches. So i would need din and dik to be contrastive. But from the data you include above, din and dik are not contrastive, i.e. the CLDR approach where a macrolanguage code is equated with a specific language.

andjc commented 12 months ago

I guess I'd need to log an application for a new variant subtag for BCP-47, applied to all six language subtags. But can variant subtags be used in Keyman?

mcdurdin commented 12 months ago

@srl295, @DavidLRowe, thoughts?

mcdurdin commented 12 months ago

But can variant subtags be used in Keyman?

In v18 this will be possible. But let's see what others suggest first as well

srl295 commented 12 months ago

@andjc I was also curious as to what you meant by "BCP47 vs CLDR". This clarifies somewhat. Encompassed languages are part of the BCP47 spec though, see https://www.rfc-editor.org/rfc/rfc5646.html#section-4.1.2

@mcdurdin Are you saying that Keyman wouldn't allow a din keyboard contrasting with a dik keyboard?

My reading of BCP47, as pertaining to Dinka is that applications may (and CLDR locale data prefers to) use din (macro) to refer to the primary encompassed language, dik, but it also allows applications to choose to use the specific encompassed tags such as dik. So you could have data tagged dik, dip, diw etc.

But this is for a language, not an orthography. I think din vs dik could be used contrastively as to a language group vs. individuals, but I don't think it should be used contrastively for indicating an orthography distinction.

If the unified orthography is the expected default (i.e. what you get when you request bare din or even dik as languages), then what I'd recommend is a new subtag of some form for the pre-unified. Perhaps something similar to the following (which is a unified historical variant, so the opposite case in some sense).

Type: variant
Subtag: baku1926
Description: Unified Turkic Latin Alphabet (Historical)
Added: 2007-04-18
Prefix: az
Prefix: ba
Prefix: crh
Prefix: kk
Prefix: krc
Prefix: ky
Prefix: sah
Prefix: tk
Prefix: tt
Prefix: uz
Comments: Denotes alphabet used in Turkic republics/regions of the
  former USSR in late 1920s, and throughout 1930s, which aspired to
  represent equivalent phonemes in a unified fashion. Also known as: New
  Turkic Alphabet; Birlәşdirilmiş Jeni Tyrk
  Әlifbasь (Birlesdirilmis Jeni Tyrk Elifbasi);
  Jaŋalif (Janalif).
DavidLRowe commented 12 months ago

IIUC there are five languages that are identified with the name "Dinka": dip Northeastern Dinka diw Northwestern Dinka dib South Central Dinka dks Southeastern Dinka dik Southwestern Dinka

In addition there is: din Dinka macrolanguage.

dik Southwestern Dinka is considered the representative language for din, and so din is used instead of (is preferred over) dik.

From Keyman's point of view, a keyboard for Southwestern Dinka, should use din (rather than dik) as the BCP 47 code and (ideally) should include all the characters needed to type any of the other four languages included in the Dinka macro language.

Steven mentioned section 4.1.2 of RFC 5646 which defines BCP 47. That does allow din-dik, din-dip, etc. as valid BCP 47 codes that are equivalent to dik, dip, etc. But I'm not sure that gets you any further. (And I don't know that Keyman would swallow them!)

I don't know if any of that is useful for your specific case.

mcdurdin commented 12 months ago

Note: reopening this issue so it is visible due to current conversation. We can close again once we are happy with the outcome, or move the conversation to a new issue.

I must admit after reading all this I still don't know the answers! This aspect of BCP47 breaks my brain every time I run across it.

@mcdurdin Are you saying that Keyman wouldn't allow a din keyboard contrasting with a dik keyboard?

Per langtags.json, as shown above, Keyman would normalize dik -> din.

andjc commented 12 months ago

There is no official orthography per se, unfortunately there is no real National Language Policy .

In actual use in South Sudan and across the diaspora, you will font the pre-1990s orthography in use; the 1990s orthography, and more recently the Unified orthography and grammar.

There are no real corpora available. The are word frequency lists based on the Rek and Pandang Bibles. But the Bible's, if I remember correctly are copyrighted so the legality of the word lists is questionable. Both of these are based on the current (1990s orthographies for each dialect). So far the Bor bible hasn't been datamined.

There is Wikipedia, but most of the articles would be based on the Unified orthography.

In terms of keyboards ... the orthographic variations don't really matter. Although the character repertoire needed for the Unified orthography and grammar is larger ... more exemplar characters. But that is neither here nor there in terms of language tagging and exposing the keyboard to users.

The real question is how to identify lexical models. Language tags will not work.

On Thu, 21 Sept 2023, 19:28 Marc Durdin, @.***> wrote:

Note: reopening this issue so it is visible due to current conversation. We can close again once we are happy with the outcome, or move the conversation to a new issue.

I must admit after reading all this I still don't know the answers! This aspect of BCP47 breaks my brain every time I run across it.

@mcdurdin https://github.com/mcdurdin Are you saying that Keyman wouldn't allow a din keyboard contrasting with a dik keyboard?

Per langtags.json, as shown above, Keyman would normalize dik -> din.

— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2311#issuecomment-1729201758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGM66XI77Y7T3WDQMPFX3X3QCDBANCNFSM6AAAAAA2WRAVRI . You are receiving this because you were mentioned.Message ID: @.***>

srl295 commented 12 months ago

@andjc I think what you are saying is that the unified orthography is the 'default' orthography going forward. In that case, I would think that keyboards, and lexical models, for unified orthography could use the following (trying to make a concrete proposal):

The exemplars do matter in principle for the various orthographies, but I hear you that pragmatically it's not going to make as much of a difference.

Then, for lexical models targetting prior orthographies, or other variations, I would use some kind of variant tag: (none of the below are registered currently of course)

via -u- extension it could be perhaps din-u-va-di1990 or diw-u-va-rejaf

edit What I'm trying to say is that, generally, i'd support some other kind of tag as appropriate for the variations mentioned here. Yes, one can find examples of 3- and even 2- letter language codes that are arguably dialects or orthography distinctions of each other, but my understanding is that that isn't necessarily a justification for creation of a new language code.

srl295 commented 12 months ago

@andjc If you'd like, you could consider filing a CLDR ticket with this use case to see if there would be CLDR-TC support or formal guidance on this use case (get some other BCP47 eyes on it), or support for an iana variant registration.

andjc commented 12 months ago

Steven, I'd tend to go the other way level the 1990s orth as default, and unified as variant. At this point of the game hard to tell of the unified will become the defacto standard or not.

Also means minimal change. Since everything currently language tagged would remain the same, rather than everything suddenly becoming mistagged.

Yep in terms of exemplar characters, the same keyboard will support all, at least in the case of this keyboard.

On Fri, 22 Sept 2023, 03:00 Steven R. Loomis, @.***> wrote:

@andjc https://github.com/andjc I think what you are saying is that the unified orthography is the 'default' orthography going forward. In that case, I would think that keyboards, and lexical models, for unified orthography could use the following (trying to make a concrete proposal):

  • dip Northeastern Dinka
  • diw Northwestern Dinka
  • dib South Central Dinka
  • dks Southeastern Dinka
  • din for Southwestern Dinka (encompassed dik)

The exemplars do matter in principle for the various orthographies, but I hear you that pragmatically it's not going to make as much of a difference.

Then, for lexical models targetting prior orthographies, or other variations, I would use some kind of variant tag: (none of the below are registered currently of course)

  • din-di1990 perhaps for a 1990s orthography southwestern dinka
  • dks-di1990 for 1990s orthography southeastern dinka
  • diw-rejaf for the 1928 (pre 1990s?) orthography ( per omniglot https://www.omniglot.com/writing/dinka.php )

via -u- extension it could be perhaps din-u-va-di1990 or diw-u-va-rejaf

— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2311#issuecomment-1729966588, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGM67PTKI444UHGQOIO3LX3RXC7ANCNFSM6AAAAAA2WRAVRI . You are receiving this because you were mentioned.Message ID: @.***>