Open baimafeima opened 6 years ago
That sounds like it could be a pretty helpful tool! So the feature would be to paste in some arbitrary block of Chinese text and get frequency data back from it about which characters are most frequently used?
Yes, exactly. I think Syng
would be a great choice for that, especially since Hanzifreq
is a terminal-based program without a suitable frontend for it.
I wouldn’t be able to include the actual hanzifreq script but I would definitely be able to build a tool that does something similar. My question is: would we want just character frequency or word frequency?
My question is: would we want just character frequency or word frequency?
I think character frequency would be the feature I would most often use. How would you approach word frequency?
First the text would be tokenized and then count the frequency of the tokenized words.
It would be great to have the ability to paste random Chinese text into a field/box as part of Syng and get a Chinese character frequency count upon clicking a button. This would allow to quickly identify the most important characters to learn from particular Chinese texts and to efficiently prepare for exams for any college student.
https://czielinski.github.io/hanzifreq/hanzifreq/output/frequencies.html See: https://github.com/czielinski/hanzifreq