chore: review changes to keyboard_info coming with v17 compiler

mcdurdin commented 1 year ago

The v17.0 compiler has two minor differences in how it builds .keyboard_info files.

Platform support differences

The old compiler could not detect if a given keyboard was web or mobile -- so it erred on the side of listing both. The new compiler uses the &targets store consistently (for any keyboards that have a .kmx output). This means that a number of keyboards will no longer be listed as supporting mobileWeb, or android/ios/mobileWeb, or in a few cases, desktopWeb.

We need to verify that these changes are not going to cause trouble by making keyboards unavailable where they should be available.

Keyboard ID	Unexpected Platforms
athinkra_vai	mobileWeb
athinkra_vai_typewriter	mobileWeb
basic_kbdcherp	android, ios, mobileWeb
basic_kbdlt2	mobileWeb
basic_kbdsw09	mobileWeb
bj_naskapi_classic	mobileWeb
coptic_greek	android, ios, mobileWeb
engram	mobileWeb
esperuni	mobileWeb
gandhari	mobileWeb
gff_geez	mobileWeb
gff_harege_fidelat	desktopWeb
gff_mesobe_fidelat	desktopWeb
itrans_bengali	android, ios, mobileWeb
itrans_devanagari_sanskrit_vedic	android, ios, mobileWeb
itrans_gujarati	android, ios, mobileWeb
korean_rr	mobileWeb
lao_2008_basic	android, ios, mobileWeb
maltese	mobileWeb
mozhi_malayalam	android, ios, mobileWeb
mro_phonetic	mobileWeb
myancode_san	android, ios, mobileWeb
nlci_bengali_winscript	android, ios, mobileWeb
nlci_gujarati_winscript	android, ios, mobileWeb
nlci_malayalam_winscript	android, ios, mobileWeb
nlci_tamil_winscript	android, ios, mobileWeb
nlci_telugu_winscript	android, ios, mobileWeb
nobonob	mobileWeb
sabdalipi_assamese	android, ios, mobileWeb
saraiki	mobileWeb
sil_cipher_music	mobileWeb
sil_dzongkha	mobileWeb
sil_indic_roman	mobileWeb
sil_khowar	mobileWeb
sil_lepcha	mobileWeb
sil_limbu_phonetic	android, ios, mobileWeb
sil_limbu_typewriter	android, ios, mobileWeb
sil_myanmar_mywinext	mobileWeb
sil_tai_dam	mobileWeb
sil_tai_dam_lao	mobileWeb
sil_tai_dam_latin	mobileWeb
sil_tai_dam_typewriter	android, ios, mobileWeb

Language name differences

The old compiler used language-subtag-registry to determine language, script and region names. The new compiler makes use of langtags.json for language names, and Intl.DisplayNames for script and region names.

We should review the list of language names to ensure there are no problematic changes. Note that this data only affects the .keyboard_info files, not packages.

Language	Keyboard ID	Old name	New Name
aae	sil_euro_latin	Arbëreshë Albanian	Albanian, Arbëreshë
abc-Latn	sil_philippines	Ambala Ayta (Latin)	Ayta, Ambala (Latin)
abe-Latn	fv_wobanakiodwawogan	Western Abnaki (Latin)	Abenaki, Western (Latin)
add	sil_cameroon_azerty	Dzodinka	Lidzonka
add	sil_cameroon_qwerty	Dzodinka	Lidzonka
aeb-Arab	sil_tunisian	Tunisian Arabic (Arabic)	Arabic, Tunisian Spoken (Arabic)
agu	chalchiteko	Aguacateco	Awakateko
aln-Brai	malar_braille	Gheg Albanian (Braille)	Albanian, Gheg (Braille)
alu-Latn	sil_areare	Are'are (Latin)	’Are’are (Latin)
arb-Arab	rac_arabic	Standard Arabic (Arabic)	Arabic, Standard (Arabic)
arq-Arab	sil_arabic_phonetic	Algerian Arabic (Arabic)	Arabic, Algerian Spoken (Arabic)
atb-Lisu	basic_kbdlisub	Zaiwa (Lisu)	Zaiwa (Fraser)
atb-Lisu	basic_kbdlisus	Zaiwa (Lisu)	Zaiwa (Fraser)
ay	sil_bolivia	Aymara	Aymara, Central
azj-Cyrl	basic_kbdaze	North Azerbaijani (Cyrillic)	Azerbaijani, North (Cyrillic)
azj-Latn	basic_kbdazst	North Azerbaijani (Latin)	Azerbaijani, North (Latin)
azj-Latn-AZ	basic_kbdazel	North Azerbaijani (Latin, Azerbaijan)	Azerbaijani, North (Latin, Azerbaijan)
ba-Cyrl	basic_kbdbash	Bashkir (Cyrillic)	Bashkort (Cyrillic)
bal	balochi_inpage	Baluchi	Balochi, Southern
bal	balochi_persian	Baluchi	Balochi, Southern
bal	balochi_phonetic	Baluchi	Balochi, Southern
bal	balochi_urdu	Baluchi	Balochi, Southern
bal-Latn	balochi_latin	Baluchi (Latin)	Balochi, Southern (Latin)
bal-Latn	balochi_scientific	Baluchi (Latin)	Balochi, Southern (Latin)
ban-Bali	aksarabali_panlex	Balinese (Balinese)	Bali (Balinese)
bgp-Arab	multi_pak_phonetic	Eastern Balochi (Arabic)	Balochi, Eastern (Arabic)
bh-Deva	itrans_devanagari_hindi	Bihari languages (Devanagari)	Bhojpuri (Devanagari)
bin	nailangs	Bini	Edo
bin-Latn	el_naija	Bini (Latin)	Edo (Latin)
bin-Latn	sil_pan_africa_mnemonic	Bini (Latin)	Edo (Latin)
bin-Latn	sil_pan_africa_positional	Bini (Latin)	Edo (Latin)
bla-Latn	fv_blackfoot	Siksika (Latin)	Blackfoot (Latin)
bm-Latn	clavbur9	Bambara (Latin)	Bamanankan (Latin)
bm-Latn	sil_mali_azerty	Bambara (Latin)	Bamanankan (Latin)
bm-Latn	sil_mali_qwerty	Bambara (Latin)	Bamanankan (Latin)
bm-Latn	sil_mali_qwertz	Bambara (Latin)	Bamanankan (Latin)
bm-Nkoo	basic_kbdnko	Bambara (N’Ko)	Bamanankan (N’Ko)
bo-Tibt-CN	basic_kbdtiprd	Tibetan (Tibetan, China)	Tibetan, Central (Tibetan, China)
bqc-Latn	sil_busa	Boko (Benin) (Latin)	Boko (Latin)
bqp-Latn	sil_nigeria_odd_vowels	Busa (Latin)	Bisã (Latin)
brb-Khmr	sil_brao	Lave (Khmer)	Brao (Khmer)
bru-Latn	sil_bru	Eastern Bru (Latin)	Bru, Eastern (Latin)
bsc	sil_senegal_bsc_azerty	Bassari	Oniyan
btk-Batk	batak	Batak languages (Batak)	btk-Batk (Batak)
bug-Bugi	basic_kbdbug	Buginese (Buginese)	Bugis (Buginese)
bwe-Latn	sil_bwe_karen	Bwe Karen (Latin)	Karen, Bwe (Latin)
bwo-Latn	sil_el_ethiopian_latin	Boro (Ethiopia) (Latin)	Borna (Latin)
byn-Ethi	gff_blin	Bilin (Ethiopic)	Bilen (Ethiopic)
byn-Ethi	gff_ethiopic	Bilin (Ethiopic)	Bilen (Ethiopic)
bzw-Latn	sil_nigeria_dot	Basa (Nigeria) (Latin)	Basa (Latin)
caf-Cans	fv_southern_carrier	Southern Carrier (Unified Canadian Aboriginal Syllabics)	Carrier, Southern (Unified Canadian Aboriginal Syllabics)
caf-Latn	fv_dakelh	Southern Carrier (Latin)	Carrier, Southern (Latin)
caf-Latn	fv_natwits	Southern Carrier (Latin)	Carrier, Southern (Latin)
chn	chinuk_wawa	Chinook jargon	Chinook Wawa
chp-Cans	fv_dene_mb	Chipewyan (Unified Canadian Aboriginal Syllabics)	Dene (Unified Canadian Aboriginal Syllabics)
chp-Cans	fv_dene_nt	Chipewyan (Unified Canadian Aboriginal Syllabics)	Dene (Unified Canadian Aboriginal Syllabics)
chp-Latn	dene	Chipewyan (Latin)	Dene (Latin)
chp-Latn	fv_denesuline	Chipewyan (Latin)	Dene (Latin)
chp-Latn	fv_denesuline_epsilon	Chipewyan (Latin)	Dene (Latin)
ckb-Arab	basic_kbdkurd	Central Kurdish (Arabic)	Kurdish, Central (Arabic)
cmo-Khmr	sil_bunong	Central Mnong (Khmer)	Mnong, Central (Khmer)
cmo-Latn	dega	Central Mnong (Latin)	Mnong, Central (Latin)
cr	bj_cree_woods	Cree	Cree, Woods
cr-Latn	fv_cree_latin	Cree (Latin)	Cree, Woods (Latin)
crj	bj_cree_east_james_bay	Southern East Cree	Cree, Southern East
crk-Cans	fv_plains_cree	Plains Cree (Unified Canadian Aboriginal Syllabics)	Cree, Plains (Unified Canadian Aboriginal Syllabics)
crk-Cans	nrc_crk_cans	Plains Cree (Unified Canadian Aboriginal Syllabics)	Cree, Plains (Unified Canadian Aboriginal Syllabics)
crk-Latn	bj_cree_west_latn	Plains Cree (Latin)	Cree, Plains (Latin)
crl	bj_cree_east	Northern East Cree	Cree, Northern East
crl-Cans	fv_northern_east_cree	Northern East Cree (Unified Canadian Aboriginal Syllabics)	Cree, Northern East (Unified Canadian Aboriginal Syllabics)
crl-Latn	bj_cree_east_latn	Northern East Cree (Latin)	Cree, Northern East (Latin)
crm-Cans	fv_moose_cree	Moose Cree (Unified Canadian Aboriginal Syllabics)	Cree, Moose (Unified Canadian Aboriginal Syllabics)
csw	bj_mista_wasaha_cree	Swampy Cree	Cree, Swampy
csw-Cans	fv_swampy_cree	Swampy Cree (Unified Canadian Aboriginal Syllabics)	Cree, Swampy (Unified Canadian Aboriginal Syllabics)
de	basic_kbdgr	German	German, Standard
de	basic_kbdgr1	German	German, Standard
de	basic_kbdsg	German	German, Standard
de	bu_phonetic	German	German, Standard
de-Runr	basic_kbdfthrk	German (Runic)	German, Standard (Runic)
de-Runr	runeboard	German (Runic)	German, Standard (Runic)
dgo-Arab-PK	rac_dogri	Dogri (individual language) (Arabic, Pakistan)	Dogri (Arabic, Pakistan)
dgr-Latn	fv_tlicho_yatii	Dogrib (Latin)	Tlicho (Latin)
din-Latn	el_dinka	Dinka (Latin)	Dinka, Southwestern (Latin)
doi-Dogr	dogra_inscript	Dogri (macrolanguage) (Dogra)	Dogri (Dogra)
dv	basic_kbddiv1	Dhivehi	Maldivian
dv	basic_kbddiv2	Dhivehi	Maldivian
ee-Latn	ghana	Ewe (Latin)	Éwé (Latin)
el	basic_kbdhe	Modern Greek (1453-)	Greek
el	basic_kbdhe220	Modern Greek (1453-)	Greek
el	basic_kbdhe319	Modern Greek (1453-)	Greek
el	basic_kbdhept	Modern Greek (1453-)	Greek
el	greekclassical	Modern Greek (1453-)	Greek
el-Latn	basic_kbdgkl	Modern Greek (1453-) (Latin)	Greek (Latin)
el-Latn	basic_kbdhela2	Modern Greek (1453-) (Latin)	Greek (Latin)
el-Latn	basic_kbdhela3	Modern Greek (1453-) (Latin)	Greek (Latin)
el-Latn	sil_hebr_grek_trans	Modern Greek (1453-) (Latin)	Greek (Latin)
emp-Latn	embera_north	Northern Emberá (Latin)	Emberá, Northern (Latin)
esg-Deva	gondi_dev	Aheri Gondi (Devanagari)	Gondi, Aheri (Devanagari)
esi	indigenous_nt	North Alaskan Inupiatun	Inupiatun, North Alaskan
ess-Cyrl	sil_yupik_cyrillic	Central Siberian Yupik (Cyrillic)	Yupik, Saint Lawrence Island (Cyrillic)
ess-Cyrl	sil_yupik_cyrillic_ru	Central Siberian Yupik (Cyrillic)	Yupik, Saint Lawrence Island (Cyrillic)
et	basic_kbdest	Estonian	Estonian, Standard
fa	basic_kbdfa	Persian	Persian, Iranian
fa	basic_kbdfar	Persian	Persian, Iranian
fa	farsiman	Persian	Persian, Iranian
fub-Arab	fulfulde_ajami_qwerty	Adamawa Fulfulde (Arabic)	Fulfulde, Adamawa (Arabic)
fub-Latn	fulfulde_latin_qwerty	Adamawa Fulfulde (Latin)	Fulfulde, Adamawa (Latin)
gbo-Latn	libtralo	Northern Grebo (Latin)	Grebo, Northern (Latin)
gn	basic_kbdgn	Guarani	Guaraní, Paraguayan
gon-Gonm	masaram_gondi	Gondi (Masaram Gondi)	Gondi, Northern (Masaram Gondi)
grc-Grek	galaxie_greek_hebrew_mnemonic	Ancient Greek (to 1453) (Greek)	Greek, Ancient (Greek)
grc-Grek	galaxie_greek_mnemonic	Ancient Greek (to 1453) (Greek)	Greek, Ancient (Greek)
grc-Grek	galaxie_greek_positional	Ancient Greek (to 1453) (Greek)	Greek, Ancient (Greek)
grc-Grek	sil_greek_polytonic	Ancient Greek (to 1453) (Greek)	Greek, Ancient (Greek)
gwc-Arab	rac_gawri	Kalami (Arabic)	Gawri (Arabic)
gwi-Latn	fv_gwichin	Gwichʼin (Latin)	Gwich’in (Latin)
hax-Latn	fv_hlgaagilda_xaayda_kil	Southern Haida (Latin)	Haida, Southern (Latin)
hbo-Hebr	galaxie_hebrew_mnemonic	Ancient Hebrew (Hebrew)	Hebrew, Ancient (Hebrew)
hbo-Hebr	galaxie_hebrew_positional	Ancient Hebrew (Hebrew)	Hebrew, Ancient (Hebrew)
hbo-Hebr	sil_hebrew	Ancient Hebrew (Hebrew)	Hebrew, Ancient (Hebrew)
hbo-Hebr	sil_hebrew_legacy	Ancient Hebrew (Hebrew)	Hebrew, Ancient (Hebrew)
hmd-Plrd	sil_hmd_plrd	Large Flowery Miao (Miao)	Miao, Large Flowery (Pollard Phonetic)
hnd-Arab	rac_hindko	Southern Hindko (Arabic)	Hindko, Southern (Arabic)
hno-Arab	basic_kbdurdu	Northern Hindko (Arabic)	Hindko, Northern (Arabic)
hsb	basic_kbdsorex	Upper Sorbian	Sorbian, Upper
hsb	basic_kbdsors1	Upper Sorbian	Sorbian, Upper
ii	sil_yi	Sichuan Yi	Nuosu
ike-Cans	fv_eastern_canadian_inuktitut	Eastern Canadian Inuktitut (Unified Canadian Aboriginal Syllabics)	Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
ike-Cans	inuktitut_naqittaut	Eastern Canadian Inuktitut (Unified Canadian Aboriginal Syllabics)	Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
ike-Cans	inuktitut_pirurvik	Eastern Canadian Inuktitut (Unified Canadian Aboriginal Syllabics)	Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
ike-Latn	inuktitut_latin	Eastern Canadian Inuktitut (Latin)	Inuktitut, Eastern Canadian (Latin)
iri-Latn	sil_nigeria_underline	Irigwe (Latin)	Rigwe (Latin)
itc-Ital	basic_kbdoldit	Italic languages (Old Italic (Etruscan, Oscan, etc.))	itc-Ital (Old Italic)
iu-Cans	basic_kbdinuk2	Inuktitut (Unified Canadian Aboriginal Syllabics)	Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
iu-Cans	basic_kbdiulat	Inuktitut (Unified Canadian Aboriginal Syllabics)	Inuktitut, Eastern Canadian (Unified Canadian Aboriginal Syllabics)
jmn-Latn	sil_makuri	Makuri Naga (Latin)	Naga, Makuri (Latin)
kaa	karakalpak_cyrillic	Kara-Kalpak	Karakalpak
kaa-Latn	karakalpak_latin	Kara-Kalpak (Latin)	Karakalpak (Latin)
kfi-Knda	nlci_kannada_winscript	Kannada Kurumba (Kannada)	Kurumba, Kannada (Kannada)
khk-Cyrl	basic_kbdmon	Halh Mongolian (Cyrillic)	Mongolian, Halh (Cyrillic)
khk-Cyrl-MN	mongolian_cyrillic_qwerty	Halh Mongolian (Cyrillic, Mongolia)	Mongolian, Halh (Cyrillic, Mongolia)
khk-Mong	basic_kbdmonmo	Halh Mongolian (Mongolian)	Mongolian, Halh (Mongolian)
khk-Mong	basic_kbdmonst	Halh Mongolian (Mongolian)	Mongolian, Halh (Mongolian)
khk-Phag-MN	basic_kbdphags	Halh Mongolian (Phags-pa, Mongolia)	Mongolian, Halh (Phags-pa, Mongolia)
kl	basic_kbdgrlnd	Kalaallisut	Greenlandic
km	basic_kbdkhmr	Central Khmer	Khmer
km	basic_kbdkni	Central Khmer	Khmer
km	khmer_advanced	Central Khmer	Khmer
km	khmer_angkor	Central Khmer	Khmer
km	sil_khmer	Central Khmer	Khmer
kmr-Arab	behdini_arab	Northern Kurdish (Arabic)	Kurdish, Northern (Arabic)
kmr-Arab	sorani_behdini_arab_qwerty	Northern Kurdish (Arabic)	Kurdish, Northern (Arabic)
ksw-Mymr	sil_sgaw_karen	S'gaw Karen (Myanmar)	Karen, S’gaw (Myanmar)
kvx-Arab	rac_parkari_koli	Parkari Koli (Arabic)	Koli, Parkari (Arabic)
kwk-Latn	fv_kwakwala	Kwakiutl (Latin)	Kwakwala (Latin)
kwk-Latn	fv_kwakwala_liqwala	Kwakiutl (Latin)	Kwakwala (Latin)
kxp-Arab	rac_wadiyara	Wadiyara Koli (Arabic)	Koli, Wadiyari (Arabic)
ky-Cyrl	basic_kbdkyr	Kirghiz (Cyrillic)	Kyrgyz (Cyrillic)
kyu	sil_kayah_kali	Western Kayah	Kayah, Western
kyu-Latn	sil_kayah_latn	Western Kayah (Latin)	Kayah, Western (Latin)
kyu-Mymr	sil_kayah_mymr	Western Kayah (Myanmar)	Kayah, Western (Myanmar)
lis-Lisu	sil_lisu_basic	Lisu (Lisu)	Lisu (Fraser)
lis-Lisu	sil_lisu_standard	Lisu (Lisu)	Lisu (Fraser)
lpo-Plrd	sil_lpo_plrd	Lipo (Miao)	Lipo (Pollard Phonetic)
lv	basic_kbdlv	Latvian	Latvian, Standard
lv	basic_kbdlv1	Latvian	Latvian, Standard
lv	basic_kbdlvst	Latvian	Latvian, Standard
mad-Java	jawa	Madurese (Javanese)	Madura (Javanese)
mhi	sil_madi	Ma'di	Ma’di
mic-Latn	fv_migmaq	Mi'kmaq (Latin)	Mi’kmaq (Latin)
mid-Mand	mandaic_phonetic	Mandaic (Mandaic)	Mandaic (Mandaean)
mmo-Latn	sil_buang	Mangga Buang (Latin)	Buang, Mangga (Latin)
mni-Mtei	meitei_legacy	Manipuri (Meitei Mayek)	Meitei (Meitei Mayek)
moe-Latn	bj_innu	Montagnais (Latin)	Innu (Latin)
moe-Latn	bj_innu_phonemic	Montagnais (Latin)	Innu (Latin)
moe-Latn	fv_ilnu_innu_aimun	Montagnais (Latin)	Innu (Latin)
mos-Latn	sil_moore	Mossi (Latin)	Mòoré (Latin)
ms	indonesian_suku	Malay (macrolanguage)	Malay, Standard
mve-Arab	rac_marwari	Marwari (Pakistan) (Arabic)	Marwari (Arabic)
mvy-Arab	rac_indus_kohistani	Indus Kohistani (Arabic)	Kohistani, Indus (Arabic)
mym-Latn	me_en	Me'en (Latin)	Me’en (Latin)
ncg-Latn	fv_nisgaa	Nisga'a (Latin)	Nisga’a (Latin)
ne	basic_kbdnepr	Nepali (macrolanguage)	Nepali
ne	nepali_traditional	Nepali (macrolanguage)	Nepali
ne	sil_devanagari_romanized	Nepali (macrolanguage)	Nepali
ne	sil_devanagari_typewriter	Nepali (macrolanguage)	Nepali
new-Newa	newa_romanized	Newari (Newa)	Newar (Newa)
new-Newa	newa_traditional	Newari (Newa)	Newar (Newa)
nod-Lana	sil_boonkit	Northern Thai (Tai Tham)	Thai, Northern (Lanna)
nqo	nko	N'Ko	N’ko
nqo	sil_nko	N'Ko	N’ko
nsq	northern_sierra_miwok	Northern Sierra Miwok	Miwok, Northern Sierra
odk-Arab	rac_oadki	Od (Arabic)	Oadki (Arabic)
oj-Cans	fv_anishinaabemowin	Ojibwa (Unified Canadian Aboriginal Syllabics)	Ojibwa, Eastern (Unified Canadian Aboriginal Syllabics)
ojb-Cans	fv_ojibwa	Northwestern Ojibwa (Unified Canadian Aboriginal Syllabics)	Ojibwa, Northwestern (Unified Canadian Aboriginal Syllabics)
ojs	bj_oji_cree	Severn Ojibwa	Oji-Cree
ojs-Cans	fv_severn_ojibwa	Severn Ojibwa (Unified Canadian Aboriginal Syllabics)	Oji-Cree (Unified Canadian Aboriginal Syllabics)
or	basic_kbdinori	Oriya (macrolanguage)	Odia
or	itrans_odia	Oriya (macrolanguage)	Odia
or	nlci_oriya_winscript	Oriya (macrolanguage)	Odia
otk-Orkh	old_turkic_udw21_qwerty	Old Turkish (Old Turkic)	Old Turkish (Orkhon)
pa	basic_kbdinpun	Panjabi	Punjabi, Eastern
pa	itrans_gurmukhi	Panjabi	Punjabi, Eastern
pa	nlci_gurmukhi_winscript	Panjabi	Punjabi, Eastern
pes-Arab	persian_phonetic	Iranian Persian (Arabic)	Persian, Iranian (Arabic)
phl-Arab	rac_palula	Phalura (Arabic)	Palula (Arabic)
pkb-Latn	btl_kenya	Pokomo (Latin)	Kipfokomo (Latin)
pnb	sil_extended_urdu_np	Western Panjabi	Punjabi, Western
pnb-Arab	rac_western_punjabi	Western Panjabi (Arabic)	Punjabi, Western (Arabic)
pnb-Arab	sanjha_punjabi	Western Panjabi (Arabic)	Punjabi, Western (Arabic)
pnb-Arab	shahmukhi_phonetic	Western Panjabi (Arabic)	Punjabi, Western (Arabic)
png	naijatype	Pongu	Pangu
ps	basic_kbdpash	Pushto	Pashto, Northern
ps	rac_pashto	Pushto	Pashto, Northern
psi-Arab	rac_pashai	Southeast Pashai (Arabic)	Pashai, Southeast (Arabic)
rar-Latn	cim	Rarotongan (Latin)	Cook Islands Maori (Latin)
rar-Latn	el_pasifika	Rarotongan (Latin)	Cook Islands Maori (Latin)
rit-Latn	el_yolngu	Ritarungo (Latin)	Ritharrngu (Latin)
rwm-Latn	sil_eastern_congo	Amba (Uganda) (Latin)	Amba (Latin)
sah-Cyrl-RU	basic_kbdyak	Yakut (Cyrillic, Russian Federation)	Yakut (Cyrillic, Russia)
sat-Latn	santali_latin	Santali (Latin)	Santhali (Latin)
sat-Olck	basic_kbdolch	Santali (Ol Chiki)	Santhali (Ol Chiki)
scs-Latn	fv_kashogotine_yati	North Slavey (Latin)	Slavey, North (Latin)
scs-Latn	fv_sahugotine_yati	North Slavey (Latin)	Slavey, North (Latin)
scs-Latn	fv_shihgotine_yati	North Slavey (Latin)	Slavey, North (Latin)
se-Latn	basic_kbdfi1	Northern Sami (Latin)	Saami, North (Latin)
se-Latn	basic_kbdno1	Northern Sami (Latin)	Saami, North (Latin)
se-Latn	basic_kbdsmsfi	Northern Sami (Latin)	Saami, North (Latin)
se-Latn	basic_kbdsmsno	Northern Sami (Latin)	Saami, North (Latin)
shu-Latn	sil_tchad	Chadian Arabic (Latin)	Arabic, Chadian Spoken (Latin)
sl	basic_kbdcr	Slovenian	Slovene
sq	basic_kbdal	Albanian	Albanian, Tosk
srr	sil_senegal_srr_azerty	Serer	Serer-Sine
srr-Arab	srr_ajami_qwerty	Serer (Arabic)	Serer-Sine (Arabic)
stp-Latn	sil_tepehuan	Southeastern Tepehuan (Latin)	Tepehuan, Southeastern (Latin)
str	fv_sencoten	Straits Salish	Salish, Straits
su-Sund	sundanese	Sundanese (Sundanese)	Sunda (Sundanese)
sw	sil_uganda_tanzania	Swahili (macrolanguage)	Swahili
syc-Syrc	basic_kbdsyr1	Classical Syriac (Syriac)	Syriac (Syriac)
syc-Syrc	basic_kbdsyr2	Classical Syriac (Syriac)	Syriac (Syriac)
syl-Beng	sil_bengali_phonetic	Sylheti (Bengali)	Sylheti (Bangla)
syr-Syrc	aramaic_hebrew	Syriac (Syriac)	Chaldean Neo-Aramaic (Syriac)
tau-Latn	fv_neeaaneegn	Upper Tanana (Latin)	Tanana, Upper (Latin)
tce-Latn	fv_southern_tutchone	Southern Tutchone (Latin)	Tutchone, Southern (Latin)
ti	geezbrhan	Tigrinya	Tigrigna
ti-ER	gff_tigrinya_eritrea	Tigrinya (Eritrea)	Tigrigna (Eritrea)
ti-ET	gff_tigrinya_ethiopia	Tigrinya (Ethiopia)	Tigrigna (Ethiopia)
tig	gff_tigre	Tigre	Tigré
tig-Ethi	sil_ethiopic	Tigre (Ethiopic)	Tigré (Ethiopic)
tig-Ethi	sil_ethiopic_power_g	Tigre (Ethiopic)	Tigré (Ethiopic)
tl-Buhd	buhid	Tagalog (Buhid)	Filipino (Buhid)
tl-Hano	hanunoo	Tagalog (Hanunoo)	Filipino (Hanunoo)
tn	basic_kbdnso	Tswana	Setswana
ttm-Latn	fv_northern_tutchone	Northern Tutchone (Latin)	Tutchone, Northern (Latin)
ttq-Tfng	sil_tawallammat	Tawallammat Tamajaq (Tifinagh)	Tamajaq, Tawallammat (Tifinagh)
tzm	basic_kbdtzm	Central Atlas Tamazight	Tamazight, Central Atlas
tzm-Tfng	basic_kbdtifi2	Central Atlas Tamazight (Tifinagh)	Tamazight, Central Atlas (Tifinagh)
tzm-Tfng-MA	basic_kbdtifi	Central Atlas Tamazight (Tifinagh, Morocco)	Tamazight, Central Atlas (Tifinagh, Morocco)
ug-Arab	basic_kbdughr	Uighur (Arabic)	Uyghur (Arabic)
ug-Arab	basic_kbdughr1	Uighur (Arabic)	Uyghur (Arabic)
ug-Arab	rac_uyghur	Uighur (Arabic)	Uyghur (Arabic)
uzn-Cyrl	basic_kbduzb	Northern Uzbek (Cyrillic)	Uzbek, Northern (Cyrillic)
wsg-Gong	gondi_gunjala	Adilabad Gondi (Gunjala Gondi)	Gondi, Adilabad (Gunjala Gondi)
wsg-Telu	gondi_tel	Adilabad Gondi (Telugu)	Gondi, Adilabad (Telugu)
xmf-Geok	colchis_phonetic	Mingrelian (Khutsuri (Asomtavruli and Nuskhuri))	Mingrelian (Georgian Khutsuri)
xnz-Copt	sil_nubian	Kenzi (Coptic)	Mattokki (Coptic)
xsl-Latn	fv_dene_zhatie	South Slavey (Latin)	Slavey, South (Latin)
ydg-Arab	rac_yidgha	Yidgha (Arabic)	Yadgha (Arabic)
ygp-Plrd	sil_ygp_plrd	Gepo (Miao)	Gepo (Pollard Phonetic)
yna-Plrd	sil_yna_plrd	Aluo (Miao)	Aluo (Pollard Phonetic)
ywq-Plrd	sil_ywq_plrd	Wuding-Luquan Yi (Miao)	Yi, Wuding-Luquan (Pollard Phonetic)
zlm-Latn	basic_kbdus	Malay (individual language) (Latin)	Malay (Latin)

LornaSIL commented 1 year ago

@mcdurdin how do we know if a name is problematical? The newest langtags weeds out pejorative names. The new names use commas and the old names don't. Is that a problem?

LornaSIL commented 1 year ago

@mcdurdin

Targets for athinkra_vai is store(&TARGETS) 'web desktop'
There is no .js file in the .kps
keyboard_info does not have anything about targets.
I downloaded the .kmp file and there is no .js file in the .kmp

So what is the issue? Is it just that it's in the wrong order (should be desktop web)?

I looked at one other keyboard and it also had store(&TARGETS) 'web desktop'

If that is throwing it off, we should be able to just update the .kmn and not doing any version changes, correct?

mcdurdin commented 1 year ago

lang tags

how do we know if a name is problematical? The newest langtags weeds out pejorative names. The new names use commas and the old names don't. Is that a problem?

All super good questions @LornaSIL :grin: At this point, I think a quick sanity check is sufficient.

Pejorative names should be excluded by this because it is using langtags.json, so that's good. (We are on 1.3.1, which is latest published version AFAICT)
I don't think commas are going to be a problem -- they just put the most significant part of the name first, which is good.

targets

Targets for athinkra_vai is store(&TARGETS) 'web desktop'

There is no .js file in the .kps

keyboard_info does not have anything about targets.

I downloaded the .kmp file and there is no .js file in the .kmp

So what is the issue? Is it just that it's in the wrong order (should be desktop web)?

Okay, perhaps my table was unclear. The 'unexpected platforms' column shows places where the old compiler was giving us targets such as mobileWeb which we probably don't want. The new compiler is giving us better data overall, and so the 'quick sanity check' here is probably just a scan down the column from your perspective to see if anything stands out as obviously wrong. I saw nothing wrong when I checked, so this table is as much for documentation of the change as anything.

LornaSIL commented 1 year ago

The names look fine. I looked at the targets and they all seemed correct, but I did a PR to tidy up all the targets statements to the minimal statement. No change to version numbers.

DavidLRowe commented 1 year ago

@mcdurdin Minor FYI re: (We are on 1.3.1, which is latest published version AFAICT)

The record has:

"api": "1.3.1",
"date": "2023-05-02",
"tag": "_version"

So the 1.3.1 is the version of the API and (hopefully) won't change too often. The date is from the last release. We hope to make another release tomorrow.

mcdurdin commented 1 year ago

Ah, gotcha! We are currently on 2023-05-04 and don't plan to update to the next version until the next major release now:

https://github.com/keymanapp/keyman/blob/master/resources/standards-data/langtags/langtags.json#L15-L19 currently shows:

    {
        "api": "1.3.1",
        "date": "2023-05-04",
        "tag": "_version"
    },

andjc commented 12 months ago

I am currently doing a rewrite of the Dinka keyboard, and noticed this issue, I take it that you are moving from a BCP47 definition to a CLDR definition of the language subtags? If so can we use the -x- extension in language tags?

On Fri, 4 Aug 2023, 09:03 Marc Durdin, @.***> wrote:

Ah, gotcha! We are currently on 2023-05-04 and don't plan to update to the next version until the next major release now:

https://github.com/keymanapp/keyman/blob/master/resources/standards-data/langtags/langtags.json#L15-L19 currently shows:
{
    "api": "1.3.1",
    "date": "2023-05-04",
    "tag": "_version"
},
— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2311#issuecomment-1664742606, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGM67FK7QIV5CIR5KMR4LXTQU2NANCNFSM6AAAAAA2WRAVRI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mcdurdin commented 12 months ago

I take it that you are moving from a BCP47 definition to a CLDR definition of the language subtags? If so can we use the -x- extension in language tags?

Not quite. First, we won't support -x- extensions until 18.0.

Second, our restriction was the lang-script-region subtag triplet because of various operating systems that didn't support more expressive tags. We are moving towards defining the best subtag for the keyboard, and gracefully degrading the subtag for those OSes that don't support arbitrary tags. It wasn't really a BCP47 vs CLDR thing.

andjc commented 12 months ago

Given this line in above message:

din-Latn el_dinka Dinka (Latin) Dinka, Southwestern (Latin)

What would the language name for din-Latn resolve to?

Since i will need to distinguish between Dinka (Latin) and Dinka, Southwestern (Latin)

On Wed, 20 Sept 2023, 11:17 Marc Durdin, @.***> wrote:

I take it that you are moving from a BCP47 definition to a CLDR definition of the language subtags? If so can we use the -x- extension in language tags?

Not quite. First, we won't support -x- extensions until 18.0.

Second, our restriction was the lang-script-region subtag triplet because of various operating systems that didn't support more expressive tags. We are moving towards defining the best subtag for the keyboard, and gracefully degrading the subtag for those OSes that don't support arbitrary tags. It wasn't really a BCP47 vs CLDR thing.

— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2311#issuecomment-1726742537, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGM64WJIG374H7ZFLAAGDX3I72HANCNFSM6AAAAAA2WRAVRI . You are receiving this because you commented.Message ID: @.***>

mcdurdin commented 12 months ago

What would the language name for din-Latn resolve to? Since i will need to distinguish between Dinka (Latin) and Dinka, Southwestern (Latin)

I'm curious: what is the distinction that you need to work with? From langtags.json, din-Latn resolves to 'Dinka (Latin)'. Furthermore, the minimal tag is din (only Windows needs -Latn suffix).

    {
        "full": "din-Latn-SS",
        "iana": [ "Dinka" ],
        "iso639_3": "din",
        "localname": "Thuɔŋjäŋ",
        "localnames": [ "Thuɔŋjäŋ" ],
        "name": "Dinka",
        "names": [ "Dinka, Southwestern", "Thoŋ ë Muɔnyjäŋ", "Thuɔŋjäŋ", "Western Dinka" ],
        "region": "SS",
        "regionname": "South Sudan",
        "script": "Latn",
        "sldr": true,
        "tag": "din",
        "tags": [ "dik", "dik-Latn", "dik-Latn-SS", "dik-SS", "din-Latn", "din-SS" ],
        "windows": "din-Latn"
    }

andjc commented 12 months ago

Ahhh, it is using the CLDR definition.

In BCP-47 din is a macrolanguage

In CLDR din is equated with dik, with din as preferred form.

What I will need to do is distinguish between the unified orthogrpahy and existing dialects, esp when it will come to the lexical models. So din would cover an orthography and grammar that is cross dialectical, and the individual language codes including dik would represent the existing dialect specific approaches. So i would need din and dik to be contrastive. But from the data you include above, din and dik are not contrastive, i.e. the CLDR approach where a macrolanguage code is equated with a specific language.

andjc commented 12 months ago

I guess I'd need to log an application for a new variant subtag for BCP-47, applied to all six language subtags. But can variant subtags be used in Keyman?

mcdurdin commented 12 months ago

@srl295, @DavidLRowe, thoughts?

mcdurdin commented 12 months ago

But can variant subtags be used in Keyman?

In v18 this will be possible. But let's see what others suggest first as well

srl295 commented 12 months ago

@andjc I was also curious as to what you meant by "BCP47 vs CLDR". This clarifies somewhat. Encompassed languages are part of the BCP47 spec though, see https://www.rfc-editor.org/rfc/rfc5646.html#section-4.1.2

@mcdurdin Are you saying that Keyman wouldn't allow a din keyboard contrasting with a dik keyboard?

My reading of BCP47, as pertaining to Dinka is that applications may (and CLDR locale data prefers to) use din (macro) to refer to the primary encompassed language, dik, but it also allows applications to choose to use the specific encompassed tags such as dik. So you could have data tagged dik, dip, diw etc.

But this is for a language, not an orthography. I think din vs dik could be used contrastively as to a language group vs. individuals, but I don't think it should be used contrastively for indicating an orthography distinction.

If the unified orthography is the expected default (i.e. what you get when you request bare din or even dik as languages), then what I'd recommend is a new subtag of some form for the pre-unified. Perhaps something similar to the following (which is a unified historical variant, so the opposite case in some sense).

Type: variant
Subtag: baku1926
Description: Unified Turkic Latin Alphabet (Historical)
Added: 2007-04-18
Prefix: az
Prefix: ba
Prefix: crh
Prefix: kk
Prefix: krc
Prefix: ky
Prefix: sah
Prefix: tk
Prefix: tt
Prefix: uz
Comments: Denotes alphabet used in Turkic republics/regions of the
  former USSR in late 1920s, and throughout 1930s, which aspired to
  represent equivalent phonemes in a unified fashion. Also known as: New
  Turkic Alphabet; Birlәşdirilmiş Jeni Tyrk
  Әlifbasь (Birlesdirilmis Jeni Tyrk Elifbasi);
  Jaŋalif (Janalif).

DavidLRowe commented 12 months ago

IIUC there are five languages that are identified with the name "Dinka": dip Northeastern Dinka diw Northwestern Dinka dib South Central Dinka dks Southeastern Dinka dik Southwestern Dinka

In addition there is: din Dinka macrolanguage.

dik Southwestern Dinka is considered the representative language for din, and so din is used instead of (is preferred over) dik.

From Keyman's point of view, a keyboard for Southwestern Dinka, should use din (rather than dik) as the BCP 47 code and (ideally) should include all the characters needed to type any of the other four languages included in the Dinka macro language.

Steven mentioned section 4.1.2 of RFC 5646 which defines BCP 47. That does allow din-dik, din-dip, etc. as valid BCP 47 codes that are equivalent to dik, dip, etc. But I'm not sure that gets you any further. (And I don't know that Keyman would swallow them!)

I don't know if any of that is useful for your specific case.

mcdurdin commented 12 months ago

Note: reopening this issue so it is visible due to current conversation. We can close again once we are happy with the outcome, or move the conversation to a new issue.

I must admit after reading all this I still don't know the answers! This aspect of BCP47 breaks my brain every time I run across it.

@mcdurdin Are you saying that Keyman wouldn't allow a din keyboard contrasting with a dik keyboard?

Per langtags.json, as shown above, Keyman would normalize dik -> din.

andjc commented 12 months ago

There is no official orthography per se, unfortunately there is no real National Language Policy .

In actual use in South Sudan and across the diaspora, you will font the pre-1990s orthography in use; the 1990s orthography, and more recently the Unified orthography and grammar.

There are no real corpora available. The are word frequency lists based on the Rek and Pandang Bibles. But the Bible's, if I remember correctly are copyrighted so the legality of the word lists is questionable. Both of these are based on the current (1990s orthographies for each dialect). So far the Bor bible hasn't been datamined.

There is Wikipedia, but most of the articles would be based on the Unified orthography.

In terms of keyboards ... the orthographic variations don't really matter. Although the character repertoire needed for the Unified orthography and grammar is larger ... more exemplar characters. But that is neither here nor there in terms of language tagging and exposing the keyboard to users.

The real question is how to identify lexical models. Language tags will not work.

On Thu, 21 Sept 2023, 19:28 Marc Durdin, @.***> wrote:

Note: reopening this issue so it is visible due to current conversation. We can close again once we are happy with the outcome, or move the conversation to a new issue.

I must admit after reading all this I still don't know the answers! This aspect of BCP47 breaks my brain every time I run across it.

@mcdurdin https://github.com/mcdurdin Are you saying that Keyman wouldn't allow a din keyboard contrasting with a dik keyboard?

Per langtags.json, as shown above, Keyman would normalize dik -> din.

— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2311#issuecomment-1729201758, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGM66XI77Y7T3WDQMPFX3X3QCDBANCNFSM6AAAAAA2WRAVRI . You are receiving this because you were mentioned.Message ID: @.***>

srl295 commented 12 months ago

@andjc I think what you are saying is that the unified orthography is the 'default' orthography going forward. In that case, I would think that keyboards, and lexical models, for unified orthography could use the following (trying to make a concrete proposal):

dip Northeastern Dinka
diw Northwestern Dinka
dib South Central Dinka
dks Southeastern Dinka
din for Southwestern Dinka (encompassed dik)

The exemplars do matter in principle for the various orthographies, but I hear you that pragmatically it's not going to make as much of a difference.

Then, for lexical models targetting prior orthographies, or other variations, I would use some kind of variant tag: (none of the below are registered currently of course)

din-di1990 perhaps for a 1990s orthography southwestern dinka
dks-di1990 for 1990s orthography southeastern dinka
diw-rejaf for the 1928 (pre 1990s?) orthography ( per omniglot )

via -u- extension it could be perhaps din-u-va-di1990 or diw-u-va-rejaf

edit What I'm trying to say is that, generally, i'd support some other kind of tag as appropriate for the variations mentioned here. Yes, one can find examples of 3- and even 2- letter language codes that are arguably dialects or orthography distinctions of each other, but my understanding is that that isn't necessarily a justification for creation of a new language code.

srl295 commented 12 months ago

@andjc If you'd like, you could consider filing a CLDR ticket with this use case to see if there would be CLDR-TC support or formal guidance on this use case (get some other BCP47 eyes on it), or support for an iana variant registration.

andjc commented 12 months ago

Steven, I'd tend to go the other way level the 1990s orth as default, and unified as variant. At this point of the game hard to tell of the unified will become the defacto standard or not.

Also means minimal change. Since everything currently language tagged would remain the same, rather than everything suddenly becoming mistagged.

Yep in terms of exemplar characters, the same keyboard will support all, at least in the case of this keyboard.

On Fri, 22 Sept 2023, 03:00 Steven R. Loomis, @.***> wrote:

@andjc https://github.com/andjc I think what you are saying is that the unified orthography is the 'default' orthography going forward. In that case, I would think that keyboards, and lexical models, for unified orthography could use the following (trying to make a concrete proposal):

dip Northeastern Dinka

diw Northwestern Dinka

dib South Central Dinka

dks Southeastern Dinka

din for Southwestern Dinka (encompassed dik)

The exemplars do matter in principle for the various orthographies, but I hear you that pragmatically it's not going to make as much of a difference.

Then, for lexical models targetting prior orthographies, or other variations, I would use some kind of variant tag: (none of the below are registered currently of course)

din-di1990 perhaps for a 1990s orthography southwestern dinka

dks-di1990 for 1990s orthography southeastern dinka

diw-rejaf for the 1928 (pre 1990s?) orthography ( per omniglot https://www.omniglot.com/writing/dinka.php )

via -u- extension it could be perhaps din-u-va-di1990 or diw-u-va-rejaf

— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2311#issuecomment-1729966588, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALGM67PTKI444UHGQOIO3LX3RXC7ANCNFSM6AAAAAA2WRAVRI . You are receiving this because you were mentioned.Message ID: @.***>

keymanapp / keyboards