Open annisat opened 1 year ago
Hi, this is an expected behavior. "エドワード" and "ジョン" exist in the mecab-ipadic dictionary but there are no entries of "ケネディー" and "マッケイン".
In terms of morphological analysis, those are "unknown" words and do not have any morphological information such as pronunciation other than estimated POS tag.
I see. Thanks for the reply.
In the case of katakana, maybe the pronuncation can be inferred from the word form? For example, replace every イ followed by エ段 katakana with ー
In the case of katakana, maybe the pronuncation can be inferred from the word form? For example, replace every イ followed by エ段 katakana with ー
It would be up to the applications, but consistent conversion with the dictionary entries makes sense to me.
I need to segment some sentences and get their pronunciations. Some katakana words don't seem to have information on their pronunciation. I can of course transcribe them by katakana's prounuciation rules. But I'm wondering if this is by design? Or this is a bug?
Here's the code to produce the error
And here's the output
The last column in ケネディー and マッケイン are "*", while エドワード and ジョン have that info.