ismla-japanese-helper / japanese-helper

Web application that analyzes Japanese text and displays pronunciation information, inflection details, English translations, and more. 🇯🇵
0 stars 0 forks source link

Analyze the correspondence between Kuromoji tags and Wiktionary tags. Implement the lookup process. #3

Closed x-ji closed 6 years ago

x-ji commented 6 years ago

IPADIC information

https://hayashibe.jp/tr/mecab/dictionary/ipadic

品詞 / 品詞細分類1

Wiktionary Information

https://en.wiktionary.org/wiki/Wiktionary:About_Japanese

They claim their methods conform to the methods in modern kokugo texts.

verenablaschke commented 6 years ago
verenablaschke commented 6 years ago

From the Wiktionary dump, I additionally get:

verenablaschke commented 6 years ago

I'm not sure if we should include this in this issue or create a new one, but we will have to merge some of the kuromoji tokens; at least those for inflected verbs/adjectives. Example:

x-ji commented 6 years ago

Now there is a first implementation. Merging of inflected verbs is still not implemented yet.

verenablaschke commented 6 years ago

We should probably also add a convertPOSTag method for Wiktionary tokens, where we map all the adjectives to a A and all the verb tags to V (we might also need to merge the noun-related tags). That way, we can establish a better link between Kuromoji tags and Wiktionary tags while still being able to display the additional information that comes with the Wiktionary tags (inflection paradigm, transitivity).

x-ji commented 6 years ago

Created a new issue for merging tokens to look for potential inflections.