Closed mike-fabian closed 2 years ago
However, when the same test is done in an xterm started like
env XMODIFIERS=@im=ibus xterm &
This test fails, one gets only U+FEFB:
https://user-images.githubusercontent.com/2330175/188914569-e380480a-99b3-431d-a7af-e16f76e1dcb7.mp4
I.e. even though ibus has compose support, the compose support in ibus apparently does not support the lines
$ grep 'ARABIC LIGATURE' /usr/share/X11/locale/en_US.UTF-8/Compose
<UFEFB> : "لا" # ARABIC LIGATURE LAM WITH ALEF
<UFEF7> : "لأ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA ABOVE
<UFEF9> : "لإ" # ARABIC LIGATURE LAM WITH ALEF WITH HAMZA BELOW
<UFEF5> : "لآ" # ARABIC LIGATURE LAM WITH ALEF WITH MADDA ABOVE
The Compose support in Gtk3 and Gtk4 does not support this either.
If I have a `~/.XCompose file containing:
$ cat ~/.XCompose
# %H expands to the user's home directory (the $HOME environment variable)
# %L expands to the name of the locale specific Compose file (i.e.,
# "/usr/share/X11/locale/<localename>/Compose")
# %S expands to the name of the system directory for Compose files (i.e.,
# "/usr/share/X11/locale")
#include "%L"
include "/%L"
and then test the Gtk3 Compose support in
env GTK_IM_MODULE=gtk-im-context-simple gedit
and the Gtk4 Compose support in
env GTK_IM_MODULE=gtk-im-context-simple gnome-text-editor
and type the b
key while the Arabic keyboard layout is active, I get U+FEFB.
So it looks like the Compose support in Gtk3/Gtk4 does not support this either.
Another compose implementation is in ibus-typing-booster and this does not support this either (that’s why I opened this issue here).
In case of ibus-typing-booster, I can obviously see why:
https://github.com/mike-fabian/ibus-typing-booster/blob/main/engine/hunspell_table.py#L5533
return False
if (not self._typed_compose_sequence
and not key.name == 'Multi_key'
and not key.name.startswith('dead_')):
if DEBUG_LEVEL > 1:
LOGGER.debug('Not in a compose sequence.')
return False
i.e. ibus-typing-booster currently considers everything which does not start with Multi_key
or dead_
not as a valid compose sequence.
I will try to fix this in typing booster, but that will only make the Arabic xkb keyboard work correctly in typing-booster of course.
To make it work correctly elsewhere, the Compose support in ibus and in Gtk3/Gtk4 need to be fixed as well.
This commit in libX11 from 2008-06-20 added the Arabic compose sequences:
https://gitlab.freedesktop.org/xorg/lib/libx11/-/commit/21e464ec682ab23ba20ddf6bd72c6db214cfbe01
commit 21e464ec682ab23ba20ddf6bd72c6db214cfbe01
Author: Khaled Hosny <khaledhosny@eglug.org>
Date: Thu Jun 19 18:26:11 2008 -0400
NLS: Add Arabic Lam-Alef ligature compose sequences (bug #16426)
Add some Arabic digraphs to utf-8 locales with a Compose.pre
Signed-off-by: James Cloos <cloos@jhcloos.com>
Here is the original bug to add these Arabic compose sequences:
https://bugs.freedesktop.org/show_bug.cgi?id=16426
Khaled Hosny 2008-06-19 05:50:06 UTC
Created [attachment 17228](https://bugs.freedesktop.org/attachment.cgi?id=17228) [[details]](https://bugs.freedesktop.org/attachment.cgi?id=17228&action=edit)
Arabic Compose rules
Arabic keyboard needs the ability to have one key stroke producing to code points, see #8195.
Adding the the attached rules to Compose files of UTF-8 locales is needed in order to fix this.
Original discussion about the problem: https://bugs.freedesktop.org/show_bug.cgi?id=8195 migrated to gitlab: https://gitlab.freedesktop.org/xorg/xserver/-/issues/346
The test builds of ibus-typing-booster >= 2.18.17 at
https://copr.fedorainfracloud.org/coprs/mfabian/ibus-typing-booster/builds/
have a fix for this problem and now support also compose sequences not starting with Multi_key
or dead keys.
This video shows that it works for the special Arabic compose sequence:
b
key: A “b” appears in gedit.b
key into gedit: The character ﻻ U+FEFB ARABIC LIGATURE LAM WITH ALEF ISOLATED FORM appears. One can confirm that it is a single character by deleting it again with Backspace
, a single Backspace
is enough to delete itb
key into gedit. Now the Arabic keyboard layout is still used, but the Compose support comes from ibus-typing-booster and not from ibus or Gtk. And in gedit the string "لا" U+0644 ARABIC LETTER LAM U+0627 ARABIC LETTER ALEF appears. This looks very similar to what we got before with the Arabic keyboard layout used without ibus-typing-booster, but we can confirm that there are actually two characters by deleting again with Backspace
. Now after typing the first backspace, ل U+0644 ARABIC LETTER LAM remains and two times Backspace
is needed to delete the complete string.https://user-images.githubusercontent.com/2330175/189303691-334b3da4-c81a-46c7-9036-7d1f8199d935.mp4
Discovered while reasearching https://bugzilla.redhat.com/show_bug.cgi?id=2122899
On the AB05 key (that is where the
b
is on an English US layout, the standard Arabic xkb layout produces U+FEFB:That is not desired, the desired result is U+0644 U+0627
But with xkb layouts, it is not possible to output more than one keysym per keystroke.
https://www.freedesktop.org/wiki/Software/XKeyboardConfig/XKB2Dreams/ talks about
`3. Support for scenarios "multiple keypresses - one keysym" and "single keypress - multiple keysyms".
If xkb would be improved like this, one could output U0644 U0627 when pressing the AB05 key.
But that is a “dream” and might never happen.
So I suggested to the user to use ibus-m17n or ibus-typing-booster with ar-kbd.mim instead, ar-kbd.mim emulates the Arabic keyboard on top of an US English keyboard layout using m17n-lib and can output any string for any keypress. It outputs the desired U+0644 U+0627 when typing
b
.But the user had also noticed that it just happens to work in Qt apps, I could not figure out why, it appeared very mysterious to me.
Today, after more discussion with the user avidseeker we finally stumbled on the reason. It is because of Compose support!
So even though the xkb keyboard for Arabic produces U+FEFB, the Compose support then replaces this by U+0644 U+0627.
That can be tested by starting xterm like this:
This makes sure that the Compose support of X11 is used and not the Compose support of ibus
Then in the xterm, type
and we see that the b produces U+0062, which is correct.
Switch to the Arabic keyboard,
type “arrow up” to get the
echo -n b | iconv -f utf8 -t utf16le | od -x
line back, go back to theb
with “arrow left”, typeb
and now one gets:I.e. even though the keyboard surely outputs only U+FEFB, the Compose support of Xorg transforms this into U+0644 U+0627