ogallagher / wordsearch_generator

Multilingual wordsearch (word search) generator
https://wordsearch.dreamhosters.com
MIT License
6 stars 2 forks source link

Reduce the uncommon characters in Chinese and Korean #32

Closed GrimPixel closed 2 years ago

GrimPixel commented 2 years ago

In the demo, most characters in Chinese and Korean are obsolete so those characters don't make fun at all.

ogallagher commented 2 years ago

Convenient Korean syllable relative frequency data, 가나다순 results download.

GrimPixel commented 2 years ago

For Chinese, there are lists of frequent characters. For People's Republic of China https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8 For Republic of China https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8

ogallagher commented 2 years ago

Adding a alphabet_char_sets/ directory, with \n delimited character list files referenced in a new alphabet charsets member.

ogallagher commented 2 years ago

For Chinese, there are lists of frequent characters. For People's Republic of China https://zh.wikisource.org/wiki/%E9%80%9A%E7%94%A8%E8%A7%84%E8%8C%83%E6%B1%89%E5%AD%97%E8%A1%A8 For Republic of China https://zh.wikisource.org/wiki/%E5%B8%B8%E7%94%A8%E5%9C%8B%E5%AD%97%E6%A8%99%E6%BA%96%E5%AD%97%E9%AB%94%E8%A1%A8

Corresponding character set files:

ogallagher commented 2 years ago

Use a similar widget to the controls I made for alphabet and example config file.

ogallagher commented 2 years ago

Modify WordsearchGenerator constructor to use selected charset if populated.

ogallagher commented 2 years ago

@GrimPixel as of now, the functionality should exist to handle better character sets for Chinese and Korean. Both cases still err on the side of using too many characters, but further improvement should just be tuning/customization of what's already been done.

GrimPixel commented 2 years ago

Thanks a lot for your work.

I guess I had some misunderstanding. These lists are “common character lists” instead of “character frequency lists”. So their rankings are not based on frequencies.

I have found lists that are ranked by frequency.

GrimPixel commented 2 years ago

I wonder if other languages can also implement letter frequency when generating the game.