mike-fabian / ibus-typing-booster

ibus-typing-booster is a completion input method for faster typing
https://mike-fabian.github.io/ibus-typing-booster/
Other
232 stars 16 forks source link

[BUG] The compose support works only for sequences starting with Multi_key or dead_ keys, it ignores lines starting with other keys #379

Closed mike-fabian closed 2 years ago

mike-fabian commented 2 years ago

Discovered while reasearching https://bugzilla.redhat.com/show_bug.cgi?id=2122899

On the AB05 key (that is where the b is on an English US layout, the standard Arabic xkb layout produces U+FEFB:

$ grep -i fefb /usr/share/X11/xkb/symbols/ara 
    key <AB05> {  [           UFEFB,                UFEF5,                  NoSymbol,            NoSymbol ]};  // ‎ﻻ‎ ‎ﻵ‎
    key <AB05> {  [           UFEFB,                UFEF5,                     U06AB,               U06AD ]};  // ‎ﻻ‎ ‎ﻵ‎     ‎ګ‎ ‎ڭ‎

That is not desired, the desired result is U+0644 U+0627

But with xkb layouts, it is not possible to output more than one keysym per keystroke.

https://www.freedesktop.org/wiki/Software/XKeyboardConfig/XKB2Dreams/ talks about

`3. Support for scenarios "multiple keypresses - one keysym" and "single keypress - multiple keysyms".

If xkb would be improved like this, one could output U0644 U0627 when pressing the AB05 key.

But that is a “dream” and might never happen.

So I suggested to the user to use ibus-m17n or ibus-typing-booster with ar-kbd.mim instead, ar-kbd.mim emulates the Arabic keyboard on top of an US English keyboard layout using m17n-lib and can output any string for any keypress. It outputs the desired U+0644 U+0627 when typing b.

But the user had also noticed that it just happens to work in Qt apps, I could not figure out why, it appeared very mysterious to me.

Today, after more discussion with the user avidseeker we finally stumbled on the reason. It is because of Compose support!

$ grep 'ARABIC LIGATURE' /usr/share/X11/locale/en_US.UTF-8/Compose
<UFEFB> :   "لا" # ARABIC LIGATURE LAM WITH ALEF
<UFEF7> :   "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE
<UFEF9> :   "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW
<UFEF5> :   "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE

So even though the xkb keyboard for Arabic produces U+FEFB, the Compose support then replaces this by U+0644 U+0627.

That can be tested by starting xterm like this:

env XMODIFIERS=@im=none xterm &

This makes sure that the Compose support of X11 is used and not the Compose support of ibus

Then in the xterm, type

 echo -n b | iconv -f utf8 -t utf16le | od -x
0000000 0062
0000002

and we see that the b produces U+0062, which is correct.

Switch to the Arabic keyboard,

setxkbmap  ara

type “arrow up” to get the echo -n b | iconv -f utf8 -t utf16le | od -x line back, go back to the b with “arrow left”, type b and now one gets:

echo -n لا | iconv -f utf8 -t utf16le | od -x 
0000000 0644 0627
0000004

I.e. even though the keyboard surely outputs only U+FEFB, the Compose support of Xorg transforms this into U+0644 U+0627

mike-fabian commented 2 years ago

https://user-images.githubusercontent.com/2330175/188913732-acdac3b2-c788-4c5c-9cd7-a85a1d9b2f3e.mp4

mike-fabian commented 2 years ago

However, when the same test is done in an xterm started like

env XMODIFIERS=@im=ibus xterm &

This test fails, one gets only U+FEFB:

https://user-images.githubusercontent.com/2330175/188914569-e380480a-99b3-431d-a7af-e16f76e1dcb7.mp4

mike-fabian commented 2 years ago

I.e. even though ibus has compose support, the compose support in ibus apparently does not support the lines

$ grep 'ARABIC LIGATURE' /usr/share/X11/locale/en_US.UTF-8/Compose
<UFEFB> :   "لا" # ARABIC LIGATURE LAM WITH ALEF
<UFEF7> :   "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE
<UFEF9> :   "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW
<UFEF5> :   "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE
mike-fabian commented 2 years ago

The Compose support in Gtk3 and Gtk4 does not support this either.

If I have a `~/.XCompose file containing:

$ cat ~/.XCompose
# %H  expands to the user's home directory (the $HOME environment variable)
# %L  expands to the name of the locale specific Compose file (i.e.,
#     "/usr/share/X11/locale/<localename>/Compose")
# %S  expands to the name of the system directory for Compose files (i.e.,
#     "/usr/share/X11/locale")

#include "%L"
include "/%L"

and then test the Gtk3 Compose support in

env GTK_IM_MODULE=gtk-im-context-simple gedit

and the Gtk4 Compose support in

env GTK_IM_MODULE=gtk-im-context-simple gnome-text-editor

and type the b key while the Arabic keyboard layout is active, I get U+FEFB.

So it looks like the Compose support in Gtk3/Gtk4 does not support this either.

mike-fabian commented 2 years ago

Another compose implementation is in ibus-typing-booster and this does not support this either (that’s why I opened this issue here).

In case of ibus-typing-booster, I can obviously see why:

https://github.com/mike-fabian/ibus-typing-booster/blob/main/engine/hunspell_table.py#L5533

            return False
        if (not self._typed_compose_sequence
            and not key.name == 'Multi_key'
            and not key.name.startswith('dead_')):
            if DEBUG_LEVEL > 1:
                LOGGER.debug('Not in a compose sequence.')
            return False

i.e. ibus-typing-booster currently considers everything which does not start with Multi_key or dead_ not as a valid compose sequence.

mike-fabian commented 2 years ago

I will try to fix this in typing booster, but that will only make the Arabic xkb keyboard work correctly in typing-booster of course.

To make it work correctly elsewhere, the Compose support in ibus and in Gtk3/Gtk4 need to be fixed as well.

mike-fabian commented 2 years ago

This commit in libX11 from 2008-06-20 added the Arabic compose sequences:

https://gitlab.freedesktop.org/xorg/lib/libx11/-/commit/21e464ec682ab23ba20ddf6bd72c6db214cfbe01

commit 21e464ec682ab23ba20ddf6bd72c6db214cfbe01
Author: Khaled Hosny <khaledhosny@eglug.org>
Date:   Thu Jun 19 18:26:11 2008 -0400

    NLS: Add Arabic Lam-Alef ligature compose sequences (bug #16426)

    Add some Arabic digraphs to utf-8 locales with a Compose.pre

    Signed-off-by: James Cloos <cloos@jhcloos.com>
mike-fabian commented 2 years ago

Here is the original bug to add these Arabic compose sequences:

https://bugs.freedesktop.org/show_bug.cgi?id=16426

Khaled Hosny 2008-06-19 05:50:06 UTC

Created [attachment 17228](https://bugs.freedesktop.org/attachment.cgi?id=17228) [[details]](https://bugs.freedesktop.org/attachment.cgi?id=17228&action=edit)
Arabic Compose rules

Arabic keyboard needs the ability to have one key stroke producing to code points, see #8195.
Adding the the attached rules to Compose files of UTF-8 locales is needed in order to fix this.
mike-fabian commented 2 years ago

Original discussion about the problem: https://bugs.freedesktop.org/show_bug.cgi?id=8195 migrated to gitlab: https://gitlab.freedesktop.org/xorg/xserver/-/issues/346

mike-fabian commented 2 years ago

The test builds of ibus-typing-booster >= 2.18.17 at https://copr.fedorainfracloud.org/coprs/mfabian/ibus-typing-booster/builds/ have a fix for this problem and now support also compose sequences not starting with Multi_key or dead keys.

This video shows that it works for the special Arabic compose sequence:

https://user-images.githubusercontent.com/2330175/189303691-334b3da4-c81a-46c7-9036-7d1f8199d935.mp4

mike-fabian commented 2 years ago

Included in https://github.com/mike-fabian/ibus-typing-booster/releases/tag/2.19.0