parsimonhi / animCJK

Draw animated Japanese characters (Kanji and Kana), Korean characters (Hanja) and Chinese characters (Hanzi) in correct stroke order using svg, free open-source code.
291 stars 71 forks source link

Missing Kangxi radicals (as characters) #13

Closed hugolpz closed 2 years ago

hugolpz commented 3 years ago

Kangxi radicals are encoded twice in Unicode. (Danger: do not mix them up !)

AnimCJK works on true characters only, as do many font. I used animeCJK to produce .gif for Kangxi radicals as characters. AnimCJK covers 168 radicals as characters. The following 46 items are missing (=absent from all locals) :

  1. 丿

(I'm especially interested by the last 20.)

Questions

Note

I compared 2 files, sorted, with one character per line in each file and no empty lines.

ls -1 *.gif | sed 's/-sbs.gif//g' | sort > ./exist-from-animCJK.md                 # list my gif files derivated from AnimCJK, remove extension, sort.
comm -23 radicals-all.md ./exist-from-animCJK.md > ./missing.md        # compare all vs subset
hugolpz commented 3 years ago

@skishore

parsimonhi commented 3 years ago

I didn't include all the radicals in animCJK official release because i didn't finish the job properly.

However, i have already made most of the job. See for instance http://gooo.free.fr/animCJK/all.php and enter the missing radicals in the input field to check if it could be ok for you (you can enter several characters at once up to 40).

What is missing for the moment is: 1) checking the japanese stroke order (that means i have to make two version of the characters). 2) adding brush effect at the beginning and the end of the strokes. I made an algorithm to do it automatically, but i need to verify the result before adding the corresponding characters to the official release of animCJK, because it is not always perfect. Check the brush checkbox in http://gooo.free.fr/animCJK/all.php to see what my algorithm does. 3) I don't know when I will complete these tasks.

hugolpz commented 3 years ago
  1. I see. So maybe I should plug my fork upon your data. 👍🏼 EDIT: I see 齒 / 40786.svg on your site but don't find 40786.svg in parsimonhi/animCJK. XHR request from my https githubpage fails due to mixed content. See https://github.com/hugolpz/animCJK/issues/3.

  2. Brush: you are reconstructing the hidden parts, that's impressive yes. I had to do it by hands.

  3. Open source = no deadlines. Document well, follow your own happy flow.

Note: Sometimes, for strategic reasons, I can be helpful to keep some open content unpublished until some strategic objective are met. This can help the project. If we are such situation please inform me of it, so I just wait for your later, official release.

hugolpz commented 3 years ago

I see you have APIs which return a mix of html and svg via getSvg.php and others.

Do you have the usual svgHans, Ja, svgHant folders, in https protocal, with cross domain queries allowed ? So I may change my local query :

    file=svgsDir+"/"+dec+".svg";
    xhr.open("GET",file,true);

into a cross domain query upon your data :

    file=apiUrl+'/'+svgsDir+'/'+dec+'.svg';
    xhr.open("GET",file,true);

Note: If not it's ok. My project is stable I can let it as it.

parsimonhi commented 3 years ago

Hello,

The "experimental" area of AnimCJK is experimental. :-) As a result it contains many errors, so take care.

If you want to get some characters from the experimental area, it is sometimes complicated because characters are stored in more than ten different folders.

To keep things simple, you can try to get radical characters from http://gooo.free.fr/animCJK/svgsZh/ folder only. In this folder, the file name of each character is suffixed by a "z" (this suffix means the character strokes are not yet "brushed").

For instance, 齒 is in http://gooo.free.fr/animCJK/svgsZh/40786z.svg. Display it in a browser, copy the code source in a text file, and name it "40786.svg" (without the "z" suffix).

Then you can put this file in the samples/svgs folder of "your" animCJK project (this folder is used to contain any additional characters that are not already in the official release of animCJK).

Finally, run samples/imageFactory.html of "your" animCJK project in a browser, select the "svgs" radio, enter 齒 in the character field, and click on "Create" button. Brushed characters as gif images will be generated.

parsimonhi commented 3 years ago

Hello,

You cannot get the svg sources of the experimental part of animCJK using ajax, but you can get them using curl.

parsimonhi commented 2 years ago

Hello,

I added the missing radicals. Note that many of them are not identical in svgsJa and svgsZhHans (because they have not the same stroke order or the same glyph or the same number of stroke).

hugolpz commented 2 years ago

Nice :D I will have some git merge to do, then it will unlock my workflow. 😄 2022 likely. Thank you @Parsimonhi !

Before closing this Kangxi radicals issue :

parsimonhi commented 2 years ago

Hello,

First, by ja, zhHans or zhHant, i means something that corresponds to the language code of a webpage such as ja, zh-hans or zh-hant (the one one puts in <html lang="ja"> for instance), to be sure we talk about the same thing.

In svgsJa, 廴 : 3 strokes (i am sure of that). When you have a doubt in Japanese, see https://kakijun.jp (not https://kakijun.com).

About the glyphs, one cannot just reproduce the Kangxi glyphs as is in kaisho (楷書/楷书) style (which is the style used in animCJK and the style used in wikimedia). One should conform as much as possible to the customs of the countries for this style. Kangxi glyphs are very closed (or the same as) to the style displayed with zh-hant lang code. But in ja or zhHans, several radicals have (slightly) different glyphs. Note that it is not a question of simplified character. When a character is really simplified in ja or zhHans, it has a different unicode.

For instance, 龜 : In zhHans (when 龜, which is a traditional character, is used in a simplified Chinese text): https://www.zhihu.com/question/20317770 In Ja (龜 is an uncommon character in Japanese): https://kakijun.jp/page/kame16200.html In zhTw: https://stroke-order.learningweb.moe.edu.tw/practice.do?lang=en&word=%E9%BE%9C In zhHk: https://www.edbchinese.hk/lexlist_en/ then enter 龜 in the "Direct input character" field then click on "Show" button Kangxi: https://www.kangxizidian.com/kangxi/1537.gif

Simplified in Ja (different unicode), 亀 : https://kakijun.jp/page/11235200.html Simplified in zhHans (different unicode), 龟 : https://www.archchinese.com/chinese_english_dictionary.html?find=%E9%BE%9F

Sometimes, it is just the shape of one or two strokes which is different, as for 黹 (check the 6th and 7th strokes): ja: https://kakijun.jp/page/chi12200.html zhTw: https://stroke-order.learningweb.moe.edu.tw/practice.do?lang=en&word=%E9%BB%B9 (same as in ja) kangxi: https://www.kangxizidian.com/kangxi/1522.gif (same as in ja) zhHans: https://www.archchinese.com/chinese_english_dictionary.html?find=%E9%BB%B9 (different from ja, zhTw and kangxi).

As a result, the difference between ja zhHans and zhHant, for a given unicode, cannot be just a question of stroke order. It is also a question of glyph, even if they are closed to each other in all languages. And sometimes it is also a question of number of strokes, as for 廴 (with no glyph alteration), or 禸 (with glyph alteration).

You can also compare the glyph using Noto fonts (warning: there are several Noto fonts sets). You will see the same kind of difference, if you use the correct Noto font set for a given lang code, most of the time. But there are exceptions of course for some characters for which Noto and kaisho style shows a different result. For instance, characters with 糹/糸 radical such as 紙 have the same glyph as in Chinese in kaisho style, but are different in Noto (and the stroke order is also different in the ja kaisho style compared to a "normal" ja style for these characters).

In summary, one cannot use the Kangxi glyphs as is in ja or zhHans (even when it is a traditional character used in a simplified Chinese text). However, one probably can use them as is in zhHant. I am not enough strong at zhHant to be sure of that, but it is sure that there are small differences between zhHant and zhTw.

parsimonhi commented 2 years ago

hello,

I just check some radicals comparing Kangxi versus Noto with zh-hant and Taiwanese hanzi (https://stroke-order.learningweb.moe.edu.tw/characters.do?lang=en). There are glyph differences between Kangxi and the two others (which seem to give the same result). Check for instance 禸 (3rd stroke), 骨 (9th and 10th strokes), 雨 (6th and 7th strokes), 舟 (6th stroke), 角 (7th stroke).

Note that Japanese glyphs seem closer to Kangxi than traditional Chinese glyphs!

hugolpz commented 2 years ago

That's is why i suggested to have both radical unicode points and characters unicode points in your project.

While i don't know the details, some items have slight differences in glyph shapes and number of strokes between their kangxi radicals points and their localized character unicode point.

parsimonhi commented 2 years ago

Hello,

You said:

While i don't know the details, some items have slight differences in glyph shapes and number of strokes between their kangxi radicals points and their character unicode point.

I see what you mean. Interesting view point. And it gives me another idea.

Besides "CJK UNIFIED IDEOGRAPH" (what i am currently using in animCJK) and "CJK RADICAL" (what you suggested), there are also some characters (with different unicode codes) called "CJK COMPATIBILITY IDEOGRAPH" that are designed to show the glyph in another language for a given character. There are already some samples of that in animCJK in the svgsJa. See for instance 勉 (21193.svg) and 勉 (64051.svg) in svgsJa. And for Kangxi radicals, there are some additional characters (with other unicode codes) called "KANGXI RADICAL". See for instance https://en.wiktionary.org/wiki/%E9%BE%9C (龜).

However, it seems like a total mess (I haven't figured out what to do yet)! But perhaps we can hope to see one day in animCJK the glyph of radicals as in Kangxi, whatever the language in use, using one of these other unicode codes.

hugolpz commented 2 years ago

@parsimonhi, I think you have a clearer understanding on this issue and can't help with my current understanding. I've been out of this field (not doing proper reading and character analysis) for a decade. Best is you lead as you see fit indeed : )