sotch-pr35mac / syng

A free, open source, cross-platform, Chinese-To-English dictionary for desktops.
https://getsyng.com
GNU General Public License v3.0
159 stars 24 forks source link

Integrate Chinese Character Frequency Counter #110

Open baimafeima opened 6 years ago

baimafeima commented 6 years ago

It would be great to have the ability to paste random Chinese text into a field/box as part of Syng and get a Chinese character frequency count upon clicking a button. This would allow to quickly identify the most important characters to learn from particular Chinese texts and to efficiently prepare for exams for any college student.

https://czielinski.github.io/hanzifreq/hanzifreq/output/frequencies.html See: https://github.com/czielinski/hanzifreq

These scripts allow the analysis of character frequencies in Chinese text corpora. This might be helpful for Chinese language learners to prioritize common characters when learning how to write.

sotch-pr35mac commented 6 years ago

That sounds like it could be a pretty helpful tool! So the feature would be to paste in some arbitrary block of Chinese text and get frequency data back from it about which characters are most frequently used?

baimafeima commented 6 years ago

Yes, exactly. I think Syng would be a great choice for that, especially since Hanzifreq is a terminal-based program without a suitable frontend for it.

sotch-pr35mac commented 6 years ago

I wouldn’t be able to include the actual hanzifreq script but I would definitely be able to build a tool that does something similar. My question is: would we want just character frequency or word frequency?

baimafeima commented 6 years ago

My question is: would we want just character frequency or word frequency?

I think character frequency would be the feature I would most often use. How would you approach word frequency?

sotch-pr35mac commented 5 years ago

First the text would be tokenized and then count the frequency of the tokenized words.