Rare Han characters - Githubissues

xfq commented 1 year ago

https://w3c.github.io/typography/#charset

Maybe we could mention rare Han characters here. In China, many people are unable to open a bank account online, buy train tickets online, or even buy cars and apartments because of uncommon characters in their names.

Problems include but are not limited to:

The rare character is not encoded in Unicode
The IMEs and fonts don’t support these rare characters
When GBK was defined, Unicode at the time did not have those characters, so GBK used the codes in the user-defined area. When Unicode encoded these characters, different input methods output different code points.
Different systems use different PUA code points for some rare characters, resulting in multiple code points for one character. Different systems use different input methods and output different Unicode code points for the same character, causing name comparison across systems to fail.
Because the rare character is not encoded, people worked around the problem and used all-capital pinyin, first-letter-capital pinyin, lower-case pinyin, and other methods. Although the problem was temporarily solved, it will fail cross-system name comparison.

Anyway, although things are getting better and better, there are still gaps for the support of rare Han characters.

xfq commented 1 year ago

We might want to consider writing a gap report for this. I'll record some relevant information in this issue.

xfq commented 1 year ago

Recently, mobile phone manufacturers are finally starting to implement GB18030-2022 level 3. The MiSans font family added 60,340 new characters to comply with the latest GB18030-2022 national standard.

There needs to be a free and open source font that supports all characters currently used in personal names. The new MiSans L3 font is an improvement, but it's not enough.

xfq commented 1 month ago

After the reorganization, the document no longer contains actual content and has become a list of links to the script resources documents, so I'll close this issue.

w3c / typography

Rare Han characters #86