Closed BLKSerene closed 5 years ago
Hi @BLKSerene
Nagisa uses UniDic's POS tags (https://directory.fsf.org/wiki/Unidic-mecab). You can find a list of Nagisa's POS tags by the following code.
import nagisa
print(nagisa.tagger.postags)
#=> ['動詞', '空白', '記号', '副詞', '接尾辞', 'ローマ字文', '接続詞', '漢文', 'oov', '接頭辞', '助詞', '英単語', '連体詞','助動詞','形容詞','未知語','名詞','URL','補助記号','言いよどみ','代名詞','web誤脱','感動詞','形状詞']
# This is English translations for Nagisa's POS tags
ja2en = {
'動詞': "verb",
'空白': "whitespace",
'記号': "symbol",
'副詞': "adverb",
'接尾辞': "suffix",
'ローマ字文': "latin_alphabet",
'接続詞': "conjunction",
'漢文': "chinese_writing",
'oov': "unknown_words",
'接頭辞': "prefix",
'助詞': "particle",
'英単語': "english word",
'連体詞': "adnominal",
'助動詞': "auxiliary_verb",
'形容詞': "adjective",
'未知語': "unknown_words",
'名詞': "noun",
'URL': "url",
'補助記号': "Supsym.",
'言いよどみ': "hesitation",
'代名詞': "pronoun",
'web誤脱': "errors_omissions",
'感動詞': "interjection",
'形状詞': "adjectival_noun"
}
If you want to convert Japanese POS tags to universal POS tags, please refer to the official link for English translations of UniDic POS tags. https://gist.github.com/masayu-a/e3eee0637c07d4019ec9
Thanks a lot for the useful information!
Could you please list the tagset used by Nagisa's POS tagger? I'm asking this since I'm trying to convert japanese POS tags to universal POS tags.