miurahr / pykakasi

Lightweight converter from Japanese Kana-kanji sentences into Kana-Roman.
https://codeberg.org/miurahr/pykakasi
GNU General Public License v3.0
421 stars 54 forks source link

Hiragana for Number Counters #116

Closed victorneo closed 3 years ago

victorneo commented 3 years ago

Describe the bug

My use case: I am using pykakasi to help generate Furigana from a given text or paragraph by using the Hiragana output.

Some counters, such as "十歳" are converted as {"orig": "十", "hira": "じゅう"}, {"orig": "歳", "hira": "とし"} instead of さい.

I am not sure if this is a bug with kakasi or whether there is a way for me to customise pykakasi to always convert 歳 -> さい if there are numbers before it.

To Reproduce Steps to reproduce the behavior:

Use the input text: "ほかの十人は同じ施設の関係者で、六人は十歳未満の子どもでした。"

歳 is mapped as とし instead of さい。

Environment (please complete the following information):

miurahr commented 3 years ago

pykakasi use dictionary search for converting words, You can contribute to add new dictionary entry for the words. Please check src/data/kakasidict.urg8

victorneo commented 3 years ago

Thank you! I shall add the ages into the dictionary and open a PR.

Are you ok if I add the missing ages up to 100 and open a PR for it?

miurahr commented 3 years ago

@victorneo Dictionary based functions has always been welcome to contribute words, its vital!

miurahr commented 3 years ago

v2.0.6 out.