google / mozc

Mozc - a Japanese Input Method Editor designed for multi-platform
Other
2.43k stars 355 forks source link

Typing issue: NSFW "スる" (サ変) must have minimized priority compared to "する" (サ変) especially when the previous word is a (サ変) noun #1010

Open tats-u opened 2 months ago

tats-u commented 2 months ago

Category of the typing issue

Choose one of them (delete rest of them)

  1. Word ranking issue (e.g. "夕日" is in the list, but ranking is lower than expected).

Issues

Write issues to the following table. (It's in the markdown format)

input [e.g.ゆうひ] expected [e.g. 夕日] actual [e.g. ユウヒ]
いんすとーるして only インストールして 1. インストールして / 2. インストールシて

image

There should be more appropriate examples, but they don't come to mind.

Version or commit-id

[e.g. Mozc-2.28.4960.100+24.11.oss or d50a8b9ae28c4fba265f734b38bc5ae392fe4d25] You can get the version string by converting "Version" or "ばーじょん".

Additional context

Add any other context about the problem here.

There are two 活用形 for "スる": サ変 and サ行五段. The former has the meaning "make love" and "masturbate" and is NSFW. The latter's meanings are "lost (run out of) money" and "pickpocket".

https://twitter.com/search?q=Mozc%E3%80%80%E3%82%B7%E3%81%A6&src=typed_query&f=top https://twitter.com/search?q=Google%E6%97%A5%E6%9C%AC%E8%AA%9E%E5%85%A5%E5%8A%9B%E3%80%80%E3%82%B7%E3%81%A6&src=typed_query https://support.google.com/gboard/thread/9218961/%E3%81%97%E3%81%BE%E3%81%99-%E2%86%92-%E3%82%B7%E3%81%BE%E3%81%99-%E3%81%A7%E5%A4%89%E6%8F%9B%E3%81%95%E3%82%8C%E3%82%8B%E3%80%82%E5%A4%89%E3%81%AA%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%AA%E3%82%93%E3%81%A7%E6%AD%A2%E3%82%81%E3%81%A6%E3%80%82?hl=ja

The number of reports on Google Japanese Input is much more than that on Mozc, but I remember I did bump into this issue even in Mozc several years ago. (Ubuntu 16, 18, or 20)

[!NOTE]

Typing issues will be closed when the entries are added to test cases and evaluations.

https://github.com/google/mozc/blob/master/src/data/test/quality_regression_test/oss.tsv https://github.com/google/mozc/blob/master/src/data/dictionary_oss/evaluation.tsv

tats-u commented 2 months ago

Also "スる" (サ変), especially the other forms than 終止形 and 連体形, must not come to the first candidate even if trained. The only exception is after "(noun) + (と or で)".