skishore / makemeahanzi

Free, open-source Chinese character data
https://www.skishore.me/makemeahanzi/
Other
1.82k stars 466 forks source link

How do other sites draw so many characters that aren't in these dictionaries? #110

Open myarcana opened 1 year ago

myarcana commented 1 year ago

How does https://www.an2.net/zi/ draw 㠭 and 麤 and other rare and complex characters that aren't in MakeMeAHanzi ?

wiogit commented 1 year ago

Every character in Unicode has some visual representation. In fact they'll have multiple variants from the various character encodings that existed before. All that site is doing is using some font to render the glyph and adding their own background images to it. For glyphs both included and not included in Unicode, you might be able to find them on GlyphWiki

What I find most interesting is origin of the decomposition data. It seems like CHISE compiled a lot of decomposition data, but for some reason the raw dataset is no longer accessible. It seems like Gavin Grover's CJK decomposition was done independently, since it uses very different composition labels. Perhaps it was done algorithmically? I don't know.

It seems like the decomposition data in MakeMeAHanzi is somewhat off. It's better in some ways and worse in others.

For 不 type decomposition
GG 不:d/t(丆,卜)
CJK-IDS 不 ⿱一③
MMAH 不 ⿱一③
For 严 type decomposition
GG 严:d/s(亚,厂)
CJK-IDS 严 ⿳一④厂
MMAH 严 ⿻亚厂
For 丂 type decomposition
GG 丂:d/t(㇐,㇉)
CJK-IDS 丂 ⿱一㇉
MMAH 丂 ⿱一?
myarcana commented 1 year ago

You're right! I meant stroke order when I said "draw", it knows the stroke order of and draws the strokes one-by-one for many many characters that I can't find data for elsewhere online