inkle / ink

inkle's open source scripting language for writing interactive narrative.
http://www.inklestudios.com/ink
MIT License
3.97k stars 482 forks source link

Support CJK Unified Ideographs? #884

Closed Myonmu closed 3 weeks ago

Myonmu commented 7 months ago

I'm currently trying to add Japanese characters to supported character set but I have some concern for adding Kanji.

In Unicode the block we are interested in is CJK Unified Ideographs, which, actually contains characters from Chinese, Japanese and Korean: link

The problem is the complete set contains 20992 characters, both common and rare, would that cause performance issue?

I managed to extract JIS X 0208 characters (around 6000 Kanji) but using CharacterRange.Define is painful as there are lots of "holes" that need to be removed by using exclude:. Using a file to enumerate these characters is surely another option.

Finally, since characters can be shared between Chinese, Japanese and Korean in the CJK block, it might just be convenient to add the whole block, if it doesn't heavily impact performance.

Attachment : an ordered set of JIS X 0208 Kanji characters. JIS0208Kanji.txt

Myonmu commented 7 months ago

Side note: I compiled inklecate with the complete CJK range and plugged it in Inky, it works without noticeable lag. However Inky's autocomplete doesn't recognize them, but autocomplete is already having trouble with latin extend so... I don't know if that should be fixed.