nieldlr / hanzi

HanziJS is a Chinese character and NLP module for Chinese language processing for Node.js
http://hanzijs.com
MIT License
375 stars 56 forks source link

origin of phonetic regularity data? #62

Closed garfieldnate closed 3 years ago

garfieldnate commented 3 years ago

Hi,

Thanks for the wonderful project! I'm looking to replicate the phonetic regularity calculations for sinoxenic languages, but there isn't any documentation on how the data was created for this project.

nieldlr commented 3 years ago

Hi @garfieldnate,

sorry for the late reply. Totally missed this. The phonetic regularity data was all produced by me using Hanzi itself. To answer more specifically:

I haven't seen anyone use the phonetic regularity functions in the wild, as its a very specific use case. Let me know if have any more questions!

garfieldnate commented 3 years ago

Thanks! This clears it all up pretty well! I wasn't expecting the data to be manually curated.

I don't want to disappoint you, but I ended up not using hanzi for my application 😅 . I instead group characters by their original phonetic component (using data from ytenx) and then classify the groups by regularity, following Heisig's Remembering the Kanji, Volume II (the classification code is over here but will probably move at some point). In the end this seemed the most consistent (I tried some very messy things before settling on this!).

nieldlr commented 3 years ago

@garfieldnate heh, no worries. All good. Glad you ended up finding a solution!