polm / cutlet

Japanese to romaji converter in Python
https://polm.github.io/cutlet/
MIT License
299 stars 21 forks source link

very useful and accurate, it would be even better if it could map kanji to kana #34

Closed theSlowBird closed 1 year ago

polm commented 1 year ago

I am working on changes to map kana to individual tokens. It is not possible to map kana to kanji in the general case because:

  1. UniDic, the data source, only provides kana for tokens, not parts of tokens
  2. While not typical, there are cases of ateji where kana do not apply to individual parts of a compound in any meaningful sense, like 小鳥遊(たかなし)

Ignoring case 2, it is possible to do some mapping of readings as a further refinement, but I'll leave that for an external tool.

polm commented 1 year ago

Closing because I am not working on this in any particular fashion, but if someone wants to work on a PR and has strategies for the above issues feel free to contact me about it.