skishore / makemeahanzi

Free, open-source Chinese character data
https://www.skishore.me/makemeahanzi/
Other
1.85k stars 464 forks source link

japanese characters #18

Closed parsimonhi closed 6 years ago

parsimonhi commented 6 years ago

I made a derived project from makemeahanzi called animCJK (https://github.com/parsimonhi/animCJK). It contains svg files for the 2136 "jōyō kanji" in use in Japan and the 3500 "frequently used simplified hanzi".

Svg files of animCJK are completely different from svg files of makemeahanzi. However i made two files (graphicsJa.txt and graphicsZhHans.txt) that have the same format as your graphics.txt. So you can import what was changed very easily if you wish it.

Note that many characters (about one third) are not the same in Japanese and in Chinese even when they share the same unicode (different stroke order, different stroke direction, different glyph, ...). So don't merge the two files without care.

Note also that i recomputed all medians in order to be sure that a stroke-width of 128 is sufficient to cover all stroke shapes.

Hope it helps.

skishore commented 6 years ago

Hey, that's really cool! Did you automate the process of changing the Chinese stroke order graphics into the Japanese ones? I'm curious what approach you took to do that.

I've gotten quite a few requests for versions of the Make Me a Hanzi data for Japanese language learners. I will be sure to forward them to your site!

On Sun, Oct 22, 2017 at 5:16 PM, parsimonhi notifications@github.com wrote:

I made a derived project from makemeahanzi called animCJK ( https://github.com/parsimonhi/animCJK). It contains svg files for the 2136 "jōyō kanji" in use in Japan and the 3500 "frequently used simplified hanzi".

Svg files of animCJK are completely different from svg files of makemeahanzi. However i made two files (graphicsJa.txt and graphicsZhHans.txt) that have the same format as your graphics.txt. So you can import what was changed very easily if you wish it.

Note that many characters (about one third) are not the same in Japanese and in Chinese even when they share the same unicode (different stroke order, different stroke direction, different glyph, ...). So don't merge the two files without care.

Note also that i recomputed all medians in order to be sure that a stroke-width of 128 is sufficient to cover all stroke shapes.

Hope it helps.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/skishore/makemeahanzi/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AAmE0L0TNXUQe-mIZxSiRqODLL2AEkY-ks5su7CTgaJpZM4QCG6y .

parsimonhi commented 6 years ago

I did it manually one by one using a text editor to modify directly the svg files (just to invert some paths, i.e. some lines of text: not very difficult and not so long). It is difficult (but not impossible) to automate the process because there are many different cases. As a result, identifying what kind of correction had to be done was the most time consuming. Anyway, the most difficult was not to correct stroke orders, but to correct glyphs and to create characters that were not defined at all in makemeahanzi (roughly 30% of the corrections were made for modifying stroke orders, 30% were made for modifying glyphs, 30% were made for creating new characters and 10% were made for other cases). To correct glyphs or to create new characters, most of the time, there was no better way than modifying the svg using a graphic editor (I used inkscape).

skishore commented 6 years ago

Huh, interesting! So for the characters that you had to create from scratch, how did you split the font's glyph into strokes? Do you have an example of one of those that you could show me on the demo page? I wrote this tool that did most of that for me, so it would be interesting to compare notes.

On Mon, Oct 23, 2017 at 5:57 PM, parsimonhi notifications@github.com wrote:

I did it manually one by one using a text editor to modify directly the svg files (just to invert some paths, i.e. some lines of text: not very difficult and not so long). It is difficult (but not impossible) to automate the process because there are many different cases. As a result, identifying what kind of correction had to be done was the most time consuming. Anyway, the most difficult was not to correct stroke orders, but to correct glyphs and to create characters that were not defined at all in makemeahanzi (roughly 30% of the corrections were made for modifying stroke orders, 30% were made for modifying glyphs, 30% were made for creating new characters and 10% were made for other cases). To correct glyphs or to create new characters, most of the time, there was no better way than modifying the svg using a graphic editor (I used inkscape).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/skishore/makemeahanzi/issues/18#issuecomment-338808870, or mute the thread https://github.com/notifications/unsubscribe-auth/AAmE0FanWuCieYz1uq2jPTvszcqYrSXcks5svQu5gaJpZM4QCG6y .

parsimonhi commented 6 years ago

I didn't get the new character strokes from a font glyph since these characters are not defined in the Arphic fonts, and trying another font doesn't work well (too much work to transform a glyph of another font into a glyph that is Arphic font like).

So, i build the new characters one by one in two steps:

On http://gooo.free.fr/animCJK/all.php, take a look at the character lists. All the characters in green are those that i created. All the characters in blue are those that i modified. All the characters in red are those that i have not created yet.

parsimonhi commented 6 years ago

For instance, i just created 崚. I got the left part from 嶼 and the right part from 绫. In this case, no need to use inkspace because the original components have roughly the right size.

Edit: The 8th stroke of 崚 has not the same look in Japanese and in Chinese (it's a hook in Japanese and a straight line in Chinese). So i got this 8th from 陵 that i already modified in animCJK. Finally, i used inkspace to slightly adapt the shape of the stroke because it was a little bit too large and misplaced. I also slightly modified the 6th and 7th strokes unless it was not really necessary. See the result and to compare the characters, enter 崚嶼绫陵 in the data field of http://gooo.free.fr/animCJK/all.php and check the grid checkbox to see differences more accurately.

Edit2: In http://gooo.free.fr/animCJK/all.php, which is an experimental version, when you select one of the Jōyō and jinmeyō kanji, Hsk hanzi, Frequently used hanzi, Commonly used hanzi or Miscellaneous, the displayed character for a given unicode is not always the same. When Jōyō and jinmeyō kanji is selected, if a specific version for Japanese is available, it is displayed, otherwise a version common to Chinese and Japanese otherwise the Chinese version. If another option is selected, if a specific version for Chinese is available, it is displayed, otherwise a version common to Chinese and Japanese otherwise the Japanese version.

For an "official" stable version, see http://gooo.free.fr/animCJK/official/index.php. But there are much more characters in the experimental version (about 10000) than in the official stable version which contains only the 2186 Japanese Jōyō kanji and the 3500 simplified Chinese Frequently used hanzi.

skishore commented 6 years ago

I see, very cool! You've basically created a new font with all those new characters!

I think the Arphic ukai font might have had quite a few of those characters in it. I just stopped building the stroke order graphics after doing the most common Chinese characters.

On Tue, Oct 24, 2017 at 2:22 AM, parsimonhi notifications@github.com wrote:

For instance, i just created 崚. I got the left part from 嶼 and the right part from 绫. In this case, no need to use inkspace because the original components have roughly the right size.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/skishore/makemeahanzi/issues/18#issuecomment-338886434, or mute the thread https://github.com/notifications/unsubscribe-auth/AAmE0P8ub1SnXclZqVUqYnQQ4QQHB1xGks5svYIqgaJpZM4QCG6y .

parsimonhi commented 6 years ago

I missed this point. Thank you for mentioning.

I found 15 Jōyō kanji in Arphic font that are not in makemeahanzi. Not a huge number, but it helps.

skishore commented 6 years ago

Since other people have also asked about non-PRC stroke order data, I've linked to your site and KanjiVG from the readme.

parsimonhi commented 6 years ago

Thanks for the link!