mike-fabian / ibus-typing-booster

ibus-typing-booster is a completion input method for faster typing
https://mike-fabian.github.io/ibus-typing-booster/
Other
226 stars 15 forks source link

[BUG] Support compose sequences containing ASCII keys written as `<U00XX>` code points #510

Closed mike-fabian closed 3 months ago

mike-fabian commented 3 months ago

If compose sequences contain code points like , and the code point is in the ASCII range, the corresponding key value is 0xXXXX. But if the code point is above the ASCII range, the corresponding key value is key value 0x0100XXXX.

Both compose sequences using and should work identical because both work identical with the original Xorg compose implementation and ibus-typing-booster should not deviate from that without a good reason.

For compose sequences involving code points above the ASCII range, it is a bit more complicated.

Some of these code points do not have keysym names in /usr/include/X11/keysymdef.h.

For example ☺ U+263A WHITE SMILING FACE is not in /usr/include/X11/keysymdef.h.

For these a compose sequence like

<U263A> : "some expansion"

already works with a keyboard layout which has U263A which then produces the key value 0x0100263A.

But there are some Unicode characters above the ASCII range which do have keysym names in /usr/include/X11/keysymdef.h. For example € U+20AC EURO SIGN.

For the euro sign, there are two ways of writing it in keyboard layouts, most keyboard layouts like /usr/share/X11/xkb/symbols/de contain EuroSign, for these keyboard layouts the key value 0x20ac is produced, see /usr/include/X11/keysymdef.h.

For such keyboard layouts, the following compose sequence works:

<EuroSign> : "some expansion"

But in some keyboard layouts, U20AC is used instead of EuroSign, for example here:

$ grep 20AC /usr/share/X11/xkb/symbols/is
   key <AD03> { [ e,          E,          U20AC,              questiondown        ] };

When typing these keys, the key value produced is 0x010020AC and then the above compose sequence using <EuroSign> does not work. But instead using <U20AC> in the compose sequence works in these cases:

<U20AC> : "some expansion"

ibus-typing-booster already works just like the original Xorg compose implementation here, both treat <EuroSign> and <U20AC> as different compose sequences, which one works depends on whether the keyboard layout uses `EuroSign` or `U20AC`.

A similar case is ә U+04D9 CYRILLIC SMALL LETTER SCHWA

$ grep 0x04d9 /usr/include/X11/keysymdef.h

define XK_kana_RU 0x04d9 / U+30EB KATAKANA LETTER RU /


```Python
$ python3 
Python 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 14.0.1 20240411 (Red Hat 14.0.1-0)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gi import require_version
>>> require_version('IBus', '1.0')
>>> from gi.repository import IBus
>>> IBus.keyval_name(0x04d9)
'kana_RU'
$ grep U04D9 /usr/share/X11/locale/en_US.UTF-8/Compose 
<dead_diaeresis> <U04D9>                : "ӛ"   U04DB # CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS
<Multi_key> <quotedbl> <U04D9>          : "ӛ"   U04DB # CYRILLIC SMALL LETTER SCHWA WITH DIAERESIS
$ grep kana_RU /usr/share/X11/locale/en_US.UTF-8/Compose 
<Multi_key> <parenleft> <kana_RU> <parenright>  : "㋸"  U32F8 # CIRCLED KATAKANA RU

here as well, <kana_RU> and <U04D9> are different compose sequences which work with different keyboard layouts.

I think this already works in the Xorg compose implementation and in ibus-typing-booster.

So to fix this issue, only the <U00XX> way of writing compose sequences containing ASCII characters needs to be fixed in ibus-typing-booster to make it agree more closely with the Xorg compose implementation.