google / mozc

Mozc - a Japanese Input Method Editor designed for multi-platform
Other
2.32k stars 329 forks source link

Cannot find the POS for: 動詞,自立,*,*,ラ変,未然形,* #920

Closed utuhiro78 closed 1 month ago

utuhiro78 commented 2 months ago

Description

I tried to build 3c10b49fcf43db32f578acb49a45cfb4fe5e1cff. It shows the warnings:

INFO: From Executing genrule //data_manager/oss:mozc_dataset_for_oss@user_pos:
WARNING:root:Cannot find the POS for: 動詞,自立,*,*,ラ変,未然形,*
WARNING:root:Cannot find the POS for: 動詞,自立,*,*,ラ変,体言接続,*
WARNING:root:Cannot find the POS for: 動詞,自立,*,*,ラ変,仮定形,*
WARNING:root:Cannot find the POS for: 動詞,自立,*,*,ラ変,命令e,*

Steps to reproduce

Steps to reproduce the behavior:

  1. bazel build package --config oss_linux -c opt

Environment

Thank you for updating the OSS dictionary!

hiroyuki-komatsu commented 2 months ago

Hi utuhiro78,

I have confirmed that the new dictionary or id.def doesn't contain those ラ変 forms.

I've begun investigating solutions to address the warnings. Please let us know if you also encountered actual issues beyond the build warnings.

Thank you!

utuhiro78 commented 2 months ago

Maybe src/data/test/dictionary is not compatible with the latest OSS dictionary. The id.def includes "ラ変," entries.

utuhiro78 commented 2 months ago

Please let us know if you also encountered actual issues beyond the build warnings.

I checked mozc-20240405.

cd /src/data/dictionary_oss/
rg ラ変\,未然形 id.def 
640:639 動詞,自立,*,*,ラ変,未然形,*

rg '\t639\t' dictionary0*
dictionary04.txt
151009:あら   639 639 0   あら

I entered these entries with mozc-20240415.

そこにあらず
そこにあり
そこにあります
そこにある
そこにあるけど
そこにあれど
そこにあれ

No problems. :-)

hiroyuki-komatsu commented 2 months ago

Thank you for the confirmation.

We have updated the dictionary generation logic to keep ラ変 entries. The future dictionaries will contain the full ラ変 entries again.

Best,