google / mozc

Mozc - a Japanese Input Method Editor designed for multi-platform
Other
2.32k stars 330 forks source link

Typing issue: Maximum conversion candidate results limit of 100 seems to hide conversion of given name 凱 (かい) (kai) when typed with only lower case letters. #847

Closed AlexFolland closed 6 months ago

AlexFolland commented 7 months ago

Category of the typing issue

  1. Out-of-vocabulary (e.g. "凱" is not in the candidate list at all when typed with only lower-case letters).

Issues

Write issues to the following table. (It's in the markdown format)

input [e.g.ゆうひ] expected [e.g. 夕日] actual [e.g. ユウヒ]
kai -> かい limit of 100 results not including 凱

Version or commit-id

2.26.4632.102.g4d2e3bd-2

You can get the version string by converting "Version" or "ばーじょん".

Aside: This did not seem to be true, by the way. I tried converting "Version" and it just gave me "Version" as the only conversion candidate. ばーじょん also did not show the version number among its conversion candidates, even when typing "ba-jyonn" which was able to produce ばーじょん in hiragana before conversion. I got the version number instead from querying my system package manager with pacman -Qi fcitx5-mozc.

Additional context

I am running Manjaro with the unstable branch and my system is up to date.

I looked through all 100 of the conversion results kanji that was displayed when typing "kai" which converted to "かい", and did not find 凱, which is the name of a family member. I noticed there is a limit of 100 conversion candidates, so I have a feeling that the conversion candidate exists, but would only be visible if the limit of 100 conversion results was raised or removed entirely. I did extensive research to try to find where to configure the limit of 100 conversion candidates and could not find anywhere that it could be configured. If 凱 does exist as a conversion candidate for かい when typed using lower-case romaji, then this issue report is focused primarily on raising or disabling the limit of 100 conversion candidates.

mikunimaru commented 7 months ago

Is it really Mozc? I think it's likely that Mozc is just installed, but another engine is enabled.

AlexFolland commented 7 months ago

I don't have any other Japanese engines installed. Here's a screenshot of my Fcitx 5 menu showing only Mozc as an enabled engine. I'm curious what makes you suspect it's a different engine. Is some behavior I've described different from what you expected? If so, what behavior is that?

image

mikunimaru commented 7 months ago

It usually looks like this. VirtualBox_Arch_01_12_2023_08_28_26 VirtualBox_Arch_01_12_2023_08_29_59

Also, the version of Mozc in the official Manjaro (Arch Linux) package is very old. Get the latest packages from AUR etc.

image image

For Arch Linux, you need to edit the environment variables and change the DE settings so that fcitx5 starts automatically at login. I think it's probably the same with Manjaro.

AlexFolland commented 7 months ago

I guess the package I am using must be out of date and has some old limit of 100 conversion candidates. It is indeed mozc, but maybe just an old version before this issue was resolved and before the version querying system was implemented.

I see the upstream copy of this particular package was flagged out of date months ago here: https://archlinux.org/packages/extra/x86_64/fcitx5-mozc/

To work around the outdated package, I tried to install the fcitx5-mozc-git package from the AUR, but I was unable to install it. Part of the installation is trying to download a file named jigyosyo-202011.zip from https://osdn.net/projects/ponsfoot-aur/storage/mozc/jigyosyo-202011.zip, and that download fails with a certificate expiry error, which causes the installation to fail.

I will await a proper package update in the Arch Linux extra repository (which will be subsequently updated in the Manjaro stable extra repository) and/or a fix of the AUR package and test this again when either package has been able to be updated on my system.

hiroyuki-komatsu commented 7 months ago

Hi AlexFolland,

We have the entry of "かい → 凱". https://github.com/google/mozc/blob/master/src/data/single_kanji/single_kanji.tsv#L627

So the latest version should be able to type 凱 from かい as mikunimaru mentioned.

I close this issue. Please feel free to reopen or update it, if you keep having a trouble with the latest version.

Thanks,

AlexFolland commented 6 months ago

I was able to update to the latest version of mozc from the AUR with the fcitx5-mozc-git by modifying the PKGBUILD file before installing the package and removing the "s" from the "https" sources for the 2 zip files from the OSDN server, working around their expired SSL certificate and allowing curl to download the files.

After the update and a reboot, this issue still manifests. There is still a limit of 100 candidates, and 凱 was still not in that limited list of candidates. I noticed I could work around the issue by typing "Kai" with an upper-case "K", but that was not explained anywhere and not expected. I thought upper and lower case were not relevant to the sounds that are produced.

However, that does not solve the issue of the limit of 100 candidates, which I still don't see an issue ticket for anywhere. How do I remove the limit of 100 candidates? Nothing explains this limit anywhere on the internet, even after hours of researching. Can any developer point me in the right direction toward removing that limit? I am extremely confused as to why my copy of mozc has a limit of 100 candidates and the one in the screenshots shown by @mikunimaru does not. Is this limit only applied on Linux and not on Windows or macOS, or something like that? If so, how can I fix mozc or fcitx to not have that limit?

I'm also confused about why typing ba-jonn does not show the version in my copy. Something's not right, and there's no clear documentation on this anywhere that I can find.

AlexFolland commented 6 months ago

I wish to reopen this ticket since I was told to "please feel free to reopen or update it, if you keep having trouble with the latest version", and I still have trouble with the latest version, as described in my previous comment, but I don't see a way to reopen it. Can someone please reopen it?

mikunimaru commented 6 months ago

It is very likely that an IME engine other than mozc is enabled. It is possible that another IME engine, perhaps installed as a dependency of other Japanese-related software, is already enabled. For example, even if you uninstall Mozc, will Japanese input still be possible? In that case, it is confirmed that a Japanese input engine other than Mozc is enabled. What DE are you using? The settings that can be advised vary depending on the type of DE.

AlexFolland commented 6 months ago

No, Japanese input is not possible if I uninstall Mozc. It is the only Japanese IME engine that is installed. I do not have any other Japanese-related software installed.

When I switch input with the fcitx5 context menu, I am clicking specifically on a context menu list entry named "Mozc". This causes text that says "あ (Hiragana)" to appear in a tooltip at the typing cursor location in the focused text box.

Additionally, I have had no ability to enter Japanese text before I installed Mozc with my package manager.

I am using XFCE as my desktop environment, and I can change input sources with fcitx5 through its tray icon with either a left click on the icon or right-click on it and clicking "Mozc" in the list. I can also press Alt+Space on my keyboard to switch to Japanese input, but to confirm that I am using only Mozc for these tests, I have made sure I am clicking the "Mozc" entry in the fcitx5 tray icon context menu.

mikunimaru commented 6 months ago

I will install Manjaro in a virtual environment tomorrow and check it. Can you install fcitx5-configtool and confirm that Mozc is included in the items?

AlexFolland commented 6 months ago

I will install Manjaro in a virtual environment tomorrow and check it.

Thank you. I expect this may help.

Can you install fcitx5-configtool and confirm that Mozc is included in the items?

Is that expected to show a different menu from the menu shown in the screenshot in my previous comment? I ran fcitx5-configtool and I see the same configuration menu that's pictured in my screenshot, with the same list as in my screenshot.

mikunimaru commented 6 months ago

There seems to be no problem with fcitx5 settings. My current guess is that ibus or fcitx (not fcitx5) is being prioritized and working as the Japanese IME. This is probably a bug that only occurs with the combination of Manjaro and xfce.

AlexFolland commented 6 months ago

OK, I've checked both of those possibilities. Ibus has no engines installed or set, and fcitx (not fcitx5) is not installed. Here is terminal output which may serve to prove that.

[alex@alex-pc ~]$ ibus list-engine

(ibus list-engine:1290786): IBUS-WARNING **: 00:55:57.320: ibus_bus_call_sync: org.freedesktop.DBus.Properties.Get: GDBus.Error:org.freedesktop.DBus.Error.UnknownProperty: Unknown interface org.freedesktop.IBus or property Engines.
[alex@alex-pc ~]$ ibus engine

(process:1290811): IBUS-WARNING **: 00:56:03.427: ibus_bus_call_sync: org.freedesktop.DBus.Properties.Get: GDBus.Error:org.freedesktop.DBus.Error.UnknownProperty: Unknown interface org.freedesktop.IBus or property GlobalEngine.
No engine is set.
[ble: exit 1]
[alex@alex-pc ~]$ pacman -Q fcitx
error: package 'fcitx' was not found
[ble: exit 1]
[alex@alex-pc ~]$ pacman -Q fcitx5
fcitx5-git 5.1.5.r48.g322eed31-1
[alex@alex-pc ~]$

If you have more suggestions for ways to test hypotheses, I am willing to run them and show the output.

mikunimaru commented 6 months ago

What are the environment variables for Japanese input? I also suspect that Manjaro may have automatically generated the wrong environment variables. https://fcitx-im.org/wiki/Setup_Fcitx_5#Environment_variables

AlexFolland commented 6 months ago

Here is the output of env, which lists all environment variables and their values.

[alex@alex-pc ~]$ env
SHELL=/bin/bash
SESSION_MANAGER=local/alex-pc:@/tmp/.ICE-unix/2692,unix/alex-pc:/tmp/.ICE-unix/2692
WINDOWID=41418755
COLORTERM=truecolor
XDG_CONFIG_DIRS=/etc/xdg:/usr/share/manjaro-kde-settings/xdg:/etc/xdg
XDG_SESSION_PATH=/org/freedesktop/DisplayManager/Session1
XDG_MENU_PREFIX=xfce-
GNOME_KEYRING_CONTROL=/run/user/1000/keyring
LC_ADDRESS=en_CA.UTF-8
VDPAU_DRIVER=nvidia
LC_NAME=en_CA.UTF-8
SSH_AUTH_SOCK=/run/user/1000/keyring/ssh
GRADLE_HOME=/usr/share/java/gradle
LIBVA_DRIVER_NAME=nvidia
DESKTOP_SESSION=xfce
LC_MONETARY=en_CA.UTF-8
FLUTTER_HOME=/opt/flutter
GTK_MODULES=canberra-gtk-module:canberra-gtk-module
XDG_SEAT=seat0
PWD=/home/alex
LOGNAME=alex
XDG_SESSION_DESKTOP=XFCE
QT_QPA_PLATFORMTHEME=qt5gtk2
XDG_SESSION_TYPE=x11
PANEL_GDK_CORE_DEVICE_EVENTS=0
XAUTHORITY=/tmp/xauth_EYyFsq
ftp_proxy=
MOTD_SHOWN=pam
HOME=/home/alex
MANGOHUD=1
LC_PAPER=en_CA.UTF-8
LANG=en_US.UTF-8
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=01;05;37;41:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.pdf=00;32:*.ps=00;32:*.txt=00;32:*.patch=00;32:*.diff=00;32:*.log=00;32:*.tex=00;32:*.doc=00;32:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
XDG_CURRENT_DESKTOP=XFCE
VTE_VERSION=7402
XDG_SEAT_PATH=/org/freedesktop/DisplayManager/Seat0
https_proxy=
_ble_util_fd_zero=34
YAOURT_COLORS=nb=1:pkg=1:ver=1;32:lver=1;45:installed=1;42:grp=1;34:od=1;41;5:votes=1;44:dsc=0:other=1;35
socks_proxy=
MOZ_DISABLE_RDD_SANDBOX=1
_ble_util_fd_stdin=30
XDG_SESSION_CLASS=user
TERM=xterm-256color
LC_IDENTIFICATION=en_CA.UTF-8
USER=alex
PAM_KWALLET5_LOGIN=/run/user/1000/kwallet5.socket
DISPLAY=:0.0
SHLVL=1
_ble_util_fd_stderr=32
LC_TELEPHONE=en_CA.UTF-8
LC_MEASUREMENT=en_CA.UTF-8
XDG_VTNR=2
XDG_SESSION_ID=2
QT_LINUX_ACCESSIBILITY_ALWAYS_ON=1
http_proxy=
EMSDK_NODE=/usr/lib/emsdk/node/15.14.0_64bit/bin/node
_ble_util_fd_null=33
MOZ_PLUGIN_PATH=/usr/lib/mozilla/plugins
XDG_RUNTIME_DIR=/run/user/1000
DEBUGINFOD_URLS=https://debuginfod.archlinux.org 
LC_TIME=en_GB.UTF-8
GTK3_MODULES=xapp-gtk3-module:xapp-gtk3-module
XDG_DATA_DIRS=/home/alex/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
PATH=/usr/lib/emsdk:/usr/lib/emsdk/upstream/emscripten:/home/alex/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/lib/emscripten:/var/lib/flatpak/exports/bin:/opt/flutter/bin:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl://home/alex/work/ecg/android-studio/bin
DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
HG=/usr/bin/hg
MAIL=/var/spool/mail/alex
EMSDK=/usr/lib/emsdk
LC_NUMERIC=en_CA.UTF-8
_ble_util_fd_stdout=31
_=/usr/bin/env
[alex@alex-pc ~]$
mikunimaru commented 6 months ago

In order for fcitx5 to work, the following values ​​are required for the environment variables. XMODIFIERS=@im=fcitx GTK_IM_MODULE=fcitx QT_IM_MODULE=fcitx There is a possibility that something other than fcitx5 is running with the current environment variables.

hiroyuki-komatsu commented 6 months ago

It's possibly a limitation of Fcitx.

Would you confirm whether the same issue happens with Ibus?

yukawa commented 6 months ago

Let's also convert this to a discussion as this looks more like a Q&A topic on how to dig into a mysterious behavior.