Open imnasnainaec opened 11 months ago
Used ChatGPT to slap together a python script to extract all localname
and localnames
characters from https://raw.githubusercontent.com/sillsdev/mui-language-picker/master/src/data/langtags.json (whose content is from https://github.com/silnrsi/langtags/blob/master/pub/langtags.json):
" ' , - : A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z ² À Á Ã Å È É Ê Ì Ð Ñ Ò Ó Ö à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ø ù ú û ü ý ā ă ą ć Č č đ ē ĕ ė ę ě ħ Ĩ ĩ Ī ī ĭ ı ļ Ł ł Ŋ ŋ Ō ō ś ŝ ş š ũ ū ŭ ů ų ŵ Ž ž Ɓ Ɔ Ɗ Ə Ɨ ơ ǀ ǎ ǝ ǩ ǫ ȟ ȯ Ɂ ɐ ɓ ɔ ɗ ə ɛ ɣ ɨ ɩ ɬ ɵ ɽ ʉ ʋ ʌ ʔ ʷ ʹ ʻ ʼ ʾ ˀ ˊ ˯ ̀ ́ ̂ ̃ ̄ ̇ ̈ ̌ ̢ ̣ ̧ ̨ ̰ ̱ ̲ ̶ ́ Ε Ν Π ά έ ή ί α ε η ι κ λ ν ο ρ σ τ ό ϯ А Б Г Д З К М Н О С Т У Х Ц Ч Ш Ю Я а б в г д е ж з и й к л м н о п р с т у ф х ц ч ш ъ ы ь э я ѓ і ї ј ў ѣ ғ ҕ Қ қ ҡ ң ҧ ү ҷ ҹ Ӏ ӄ Ӈ ӈ ӏ ӑ ӗ ә ӣ ӧ ӹ Ԓ ԓ ա ե է հ մ յ ն տ ր ւ ְ ֲ ִ ַ ָ ֹ ּ ־ ׁ א ב ג ד ה ו ח י ל מ ן נ ס ע פ ק ר ש ת آ ؤ ئ ا ب ة ت ج ح خ د ذ ر ز س ش ص ط ع غ ف ق ك ل م ن ه و ى ي َ ُ ِ ْ ٛ ٜ ٲ ٽ پ چ ڈ ڌ ڍ ڑ ښ ڢ ک ڪ گ ھ ہ ۆ ۇ ۊ ی ێ ې ە ܐ ܘ ܝ ܠ ܢ ܣ ܪ ܫ ܬ ހ ބ ވ ދ ސ ަ ި ެ ް ँ ं ः अ आ इ ई ऊ क ख ग घ ङ च छ ज झ ट ठ ड ढ ण त थ द ध न प फ ब भ म य र ऱ ल ळ व श ष स ह ़ ा ि ी ु ू ृ े ै ॉ ो ौ ् ड़ ॱ ঁ ং অ ই উ ক গ চ ছ জ ট ড ণ ত দ ন প ব ভ ম য র ল শ স হ ় া ি ী ু ৃ ে ৈ ো ্ ৰ ਜ ਪ ਬ ਭ ਸ ਼ ਾ ੀ ੰ ં આ ક ગ ચ છ જ ડ ત દ મ ય ર વ સ ા િ ી ુ ્ ଆ ଇ ଉ ଓ କ ଙ ଜ ଟ ଡ ଣ ଦ ପ ବ ମ ର ଳ ଶ ସ ଼ ା ି ୀ ୁ େ ୋ ୍ இ க ச ட ண த ன ப ம ய ர ற ள ழ ா ி ு ெ ொ ௌ ் ం ఆ ఎ ఒ క గ జ డ త ద బ య ర ఱ ల వ స ా ి ు ె ొ ో ్ ಕ ಗ ಡ ತ ನ ಬ ಭ ಳ ವ ಷ ಾ ು ೆ ೊ ್ ം ക ഗ ഡ ണ പ മ യ റ ല ള ാ ി ു ് ං ල ස හ ි ก ข ค ง ช ซ ญ ต ถ ท น บ ป ผ พ ภ ม ย ร ล ว ษ ส อ ฮ ะ ั า ำ ิ ี ื ุ ู เ ใ ไ ่ ้ ์ ກ ຍ ຕ ບ ພ ມ ຣ ລ ວ ສ ຫ ອ າ ຶ ຸ ູ ໂ ້ ໌ ་ ། ཀ ཁ ག ང ཆ ད བ མ འ ཡ ར ལ ས ི ོ ྐ ྗ ྟ ྫ ྭ ྱ ྲ က ခ င စ ဆ ည တ ဒ န ပ ဖ ဘ မ ယ ရ လ ဝ သ အ ဢ ါ ာ ိ ီ ု ေ ဲ ံ ့ း ် ြ ွ ှ ၠ ၤ ၵ ၸ ႃ ႆ ႈ ႏ ႝ ა ე თ ი ლ ნ რ უ ქ შ ሃ ሊ ላ ል ሙ ማ ም ረ ሪ ራ ር ሮ ሰ ስ ሶ ቡ ባ ቤ ብ ቱ ታ ት ኃ ነ ን ኖ ኛ አ ኡ ኢ ኣ ኬ ኾ ወ ዋ ው ዓ ዕ ዝ የ ይ ዳ ዴ ጉ ጊ ጌ ግ ጎ ጚ ጛ ጤ ፈ ፋ ፟ ፡ Ꭶ Ꭹ Ꭿ Ꮃ Ꮒ Ꮝ Ꮧ Ꮳ Ꮼ ᐃ ᐄ ᐅ ᐊ ᐍ ᐏ ᐣ ᐤ ᐦ ᐧ ᐱ ᑎ ᑐ ᑦ ᑲ ᒃ ᒧ ᒨ ᓀ ᓂ ᓃ ᓄ ᓅ ᓇ ᓐ ᓕ ᓖ ᔅ ᔑ ᔨ ᔪ ᔫ ᔭ ᖅ ᖬ ខ គ ង ត ន ព ម រ ឞ ឹ ូ ួ ែ ំ ្ ៝ ᥑ ᥒ ᥖ ᥨ ᥬ ᥭ ᥰ ᥳ ᦅ ᦑ ᦟ ᦹ ᦺ ᧄ ᧉ ᱛ ᱟ ᱤ ᱥ ᱱ ᱲ ᴐ ᶉ ḇ ḍ Ḏ ḓ ḥ ṇ ṣ ṭ ṯ ṹ ạ ẹ Ẽ ẽ ế ệ Ị ị Ọ ọ ụ ỹ Ἑ ‑ ‘ ’ ” ‧ ⁴ ↄ ⲉ ⲏ ⲓ ⲙ ⲛ ⲣ ⲧ ⲭ ⴰ ⴳ ⴼ ⵃ ⵆ ⵉ ⵌ ⵍ ⵎ ⵏ ⵓ ⵔ ⵖ ⵛ ⵜ ⵡ ⵢ ⵣ ⵥ ア イ ウ グ タ チ ナ ヌ ー ㇰ 中 佒 壮 壯 徳 文 日 本 粤 粵 繁 語 语 靖 體 ꆈ ꉙ ꌠ ꓡ ꓢ ꓲ ꓴ ꔤ ꕙ ꞌ ꤊ ꤛ ꤜ ꤟ ꤢ ꤤ ꤬ ꤭ ꩫ ꩱ ꬃ 국 어 한 ﬞ ﯣ 𑃐 𑃚 𑃝 𑄋 𑄌 𑄟 𑄦 𑄳 𑄴 𞤆 𞤢 𞤤 𞤪 𞤵
The above characters are from the following unicode ranges:
Below are maximal unicode ranges for scripts with something outside of + (Basic_Latin, Latin-1_Supplement, Latin_Extended-A, Latin_Extended-B, IPA_Extensions, Spacing_Modifier_Letters, Combining_Diacritical_Marks, Phonetic_Extensions, Phonetic_Extensions_Supplement, Latin_Extended_Additional, General_Punctuation, Superscripts_and_Subscripts, Number_Forms, Latin_Extended-D).
Greek_and_Coptic, Greek_Extended, Coptic, (+): 395-3ef, 1f19, 2c89-2cad, (41-74, 300-341, 1d10)
Cyrillic, Cyrillic_Supplement, (+): 410-513, (42-eb, 181-304, 2019, 201d)
Armenian: 561-582
Hebrew, Alphabetic_Presentation_Forms: 5b0-5ea, fb1e
Arabic, Arabic_Presentation_Forms-A, (+): 622-6d5, fbe3, (43-75, 202c)
Syriac: 710-72c
Thaana: 780-7b0
Devanagari, (General_Punctuation): 901-971, (200d)
Bengali: 981-9f0
Gurmukhi: a1c-a70
Gujarati: a82-acd
Oriya, (General_Punctuation): b06-b4d, (200c)
Tamil: b87-bcd
Telugu: c02-c4d
Kannada: c95-ccd
Malayalam: d02-d4d
Sinhala: d82-dd2
Thai: e01-e4c
Lao: e81-ecc
Tibetan: f0b-fb2
Myanmar, Myanmar_Extended-A: 1000-109d, aa6b, aa71
Georgian: 10d0-10e8
Ethiopic, Ethiopic_Extended-A, (Basic_Latin): 1203-1361, ab03, (44-77)
Cherokee: 13a6-13ec
Unified_Canadian_Aboriginal_Syllabics: 1403-15ac
Khmer, (Basic_Latin): 1781-17dd, (42-75)
Tai_Le: 1951-1973
New_Tai_Lue: 1985-19c9
Ol_Chiki: 1c5b-1c72
Tifinagh: 2d30-2d65
Katakana: 30a2-31f0
Yi_Syllables: a188-a320
Lisu: a4e1-a4f4
Vai: a524, a559
Kayah_Li: a90a-a92d
Hangul_Syllables: ad6d, c5b4, d55c
Sora_Sompeng: 110d0-110dd
Chakma: 1110b-11134
Adlam: 1e906-1e935
Probably good font coverage according to https://github.com/silnrsi/langfontfinder/blob/main/data/script2font.csv
Noto Sans covers: Latin, Greek, Cyrillic, Devanagari.
The following have their own Noto Sans ___: [Coptic](https://en.wikipedia.org/wiki/Coptic(Unicodeblock)), [Armenian](https://en.wikipedia.org/wiki/Armenian(Unicodeblock)), [Hebrew](https://en.wikipedia.org/wiki/Hebrew(Unicodeblock)), [Arabic](https://en.wikipedia.org/wiki/Arabic(Unicodeblock)), [Syriac](https://en.wikipedia.org/wiki/Syriac(Unicodeblock)), [Thaana](https://en.wikipedia.org/wiki/Devanagari(Unicodeblock)), [Bengali](https://en.wikipedia.org/wiki/Bengali(Unicodeblock)), [Gurmukhi](https://en.wikipedia.org/wiki/Gurmukhi(Unicodeblock)), [Gujarati](https://en.wikipedia.org/wiki/Gujarati(Unicodeblock)), [Oriya](https://en.wikipedia.org/wiki/Oriya(Unicodeblock)), [Tamil](https://en.wikipedia.org/wiki/Tamil(Unicodeblock)), [Telugu](https://en.wikipedia.org/wiki/Telugu(Unicodeblock)), [Kannada](https://en.wikipedia.org/wiki/Kannada(Unicodeblock)), [Malayalam](https://en.wikipedia.org/wiki/Malayalam(Unicodeblock)), [Sinhala](https://en.wikipedia.org/wiki/Sinhala(Unicodeblock)), [Thai](https://en.wikipedia.org/wiki/Thai(Unicodeblock)), [Lao](https://en.wikipedia.org/wiki/Lao(Unicodeblock)), [Myanmar](https://en.wikipedia.org/wiki/Myanmar(Unicodeblock)), [Georgian](https://en.wikipedia.org/wiki/Georgian(Unicodeblock)), [Ethiopic](https://en.wikipedia.org/wiki/Ethiopic(Unicodeblock)), [Cherokee](https://en.wikipedia.org/wiki/Cherokee(Unicode_block)), Canadian Aboriginal, Khmer, Tai Le, New Tai Lue, Ol Chiki, Tifinagh, Yi , Lisu, Vai, Kayah Li, Sora Sompeng, Chakma, Adlam
Covered by Noto Sans JP, Noto Sans KR, Noto Sans SC, Noto Sans TC: Katakana, CJK Unified Ideographs, Hangul
Noto Serif Tibetan: Tibetan
Per https://developers.google.com/fonts/docs/getting_started:
... returns a 377 KB css file.
Below are the results from testing coverage of Noto Sans JP/KR/SC/TC on Katakana (10 characters), CJK_Unified (15 characters), and Hangul (3 characters)
JP: ア イ ウ グ タ チ ナ ヌ ー ㇰ 中 佒 壮 壯 徳 文 日 本 粤 □ 繁 語 □ 靖 體 □ □ □ KR: ア イ ウ グ タ チ ナ ヌ ー □ 中 □ □ 壯 □ 文 日 本 □ □ 繁 語 □ 靖 體 국 어 한 SC: ア イ ウ グ タ チ ナ ヌ ー □ 中 □ 壮 壯 徳 文 日 本 粤 粵 繁 語 语 靖 體 □ □ □ TC: ア イ ウ グ タ チ ナ ヌ ー □ 中 佒 □ 壯 □ 文 日 本 □ 粵 繁 語 □ 靖 體 □ □ □
So TC is redundant and removed from the above link.
WS Tech is working on a font (inspired by https://github.com/santhoshtr/AutonymFont) to support precisely the autonyms present in the langtags.json
that they maintain.
Here's the in-development WSTech script for generating said font: https://github.com/silnrsi/palaso-python/blob/master/scripts/font/autonyms.py
The "kyu" example doesn't show tofu anymore on QA or on thecombine.app. And more extensive spot tests yield no tofu.
@jmgrady Does this issue appear on the NUC and/or your offline Ubuntu deployments?
Yes, kyu
shows tofu on the NUC. The language fonts installed are:
localLangList:
- "ar"
- "en"
- "es"
- "fr"
- "pt"
- "zh"
The MuiLanguagePicker has some characters that aren't supported by our default UI font. For example, see the results in a search for ~"yan"~ "kyu":