xkbcommon / libxkbcommon

keymap handling library for toolkits and window systems
https://xkbcommon.org
Other
278 stars 123 forks source link

Modern Composition #426

Open wismill opened 8 months ago

wismill commented 8 months ago

Modern Composition

NOTE: This document is a draft.

Introduction

The current power of Compose sequences is great but looks limited compared to macOS.

macOS uses a state machine, which is quite powerful. In fact, the current implementation of Compose in xkbcommon also uses a state machine internally, but we do not use its full power.

I propose we change that and create a new format in order to:

Proposed changes

New Compose file format

The new Compose file format is based on a restricted set of features of YAML 1.2.

Documented example:

# First document is reserved for configuration
compose version: 2  # mandatory format version. Legacy files have implicit version: 1.
--- # Start a new YAML document
# States are identified by a name. TODO: recommendations for standard dead keys
acute:
  # Optional corresponding keysyms. If none: custom state
  keysym: dead_acute
  # If set, the following string is displayed while composing
  feedback: "´"
  # State transitions
  transitions:
    # Implicit entry of one character.
    # Equivalent to legacy: <dead_acute> <a>: "á" aacute
    # Equivalent to new: {char: á, keysym: aacute, next: __none__}
    a: "á"
    # Implicit entry of multiple characters.
    # Equivalent to legacy: <dead_acute> <q>: "q́"
    # Equivalent to new: {string: "q́", keysym: __none__, next: __none__}
    q: "q́"
    # Explicit entry of one character without keysym.
    # Equivalent to legacy: <dead_acute> <e>: "é" eacute
    # Equivalent to new: {char: é, keysym: eacute, next: __none__}
    e: {char: "é"}
    # Explicit entry of one character with keysym.
    # Equivalent to legacy: <dead_acute> <i>: "í" iacute
    # Equivalent to new: {char: í, keysym: iacute, next: __none__}
    i: {char: "í", keysym: iacute}
    # Explicit entry of multiple characters.
    # Equivalent to legacy: <dead_acute> <x>: "x́"
    # Equivalent to new: {string: "x́", keysym: __none__, next: __none__}
    x: {string: "x́"}
    # Chained dead key
    # Equivalent to legacy:
    #   <dead_acute> <dead_macron> <e>: U1E17 "ḗ"
    #   <dead_acute> <dead_macron> <o>: U1E53 "ṓ"
    # Equivalent to new: {char: __none__, next: macron_and_acute}
    dead_macron: {next: macron_and_acute}
    # Sequences (avoid creating explicit intermediate states, e.g. “double_acute”)
    # Equivalent to legacy: <dead_acute> <dead_acute> <o>: "ő" odoubleacute
    dead_acute o: "ő" # U+0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE
    # Equivalent to legacy: <dead_acute> <dead_acute> <u>: "ű" udoubleacute
    dead_acute u: "ű" # U+0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE
    # Loop. Equivalent to: {next: acute}
    # No legacy equivalent
    dead_acute dead_acute: {next: __loop__}
    # TODO: how to handle overlaps?
    dead_acute dead_acute o: 🦧
    # Wildcard (aka “terminator”): match any input.
    # Here we match any input, then discard it and stop.
    # This is the default behaviour (no need to set it) and
    # correspond to the legacy behaviour.
    _: {next: __none__}
macron_and_acute:
  # NOTE: custom state (no associated keysym)
  feedback: "\u02DD" # U+02DD DOUBLE ACUTE ACCENT
  transitions:
    e: "ḗ" # U+1E17 LATIN SMALL LETTER E WITH MACRON AND ACUTE
    o: "ṓ" # U+1E53 LATIN SMALL LETTER O WITH MACRON AND ACUTE
    # Wildcard: match any input, discard it, output "\u02DD" and stop
    _: {char: "\u02DD"}
compose:
  keysym: Multi_key
  transitions:
    # Some classical XCompose sequences
    period period: "…"
    period minus: "·"
    period equal: "•"
    f o r a l l: "∀" # U+2200 FOR ALL
    # Chained dead key (level 1)
    m: {next: math}
math:
  keysym: 0x11000000 # custom keysym
  transitions:
    # Chained dead keys (level 2)
    i: {next: math-italic}
    b: {next: math-bold}
    s: {next: math-double-struck}
    # Wildcard: match any input, output it unchanged, then stop
    _: {keysym: __input__}
math-italic:
  transitions:
    a: {char: "𝑎", next: __loop__}
    i: {char: "𝑖", next: __loop__}
    # Wildcard: match any input, output it unchanged, then loop
    _: {keysym: __input__, next: __loop__}
math-bold:
  transitions:
    a: "𝐚"
    i: "𝐢"
    # Wildcard with built-in filters
    _:
      # Discard but keep looping
      - {filter: __letter__, next: __loop__}
      # Output unchanged and loop
      - {filter: __number__, keysym: __input__, next: __loop__}
      - {filter: __punctuation__, keysym: __input__, next: __loop__}
      # Output unchanged and stop
      - {keysym: __input__}
math-double-struck:
  feedback: 𝔸
  transitions:
    e: {char: 𝕖}
    E: {char: 𝔼, keysym: U1D53C}
--- # Start a new YAML document
# Include locale Compose
!include "%L"
--- # Start a new YAML document
# Include custom Compose file
!include "%H/path/to/other-compose-file"

Partially converted en_US.UTF8/Compose:

compose version: 2
---
acute:
  keysym: dead_acute
  feedback: "´"
  transitions:
    space: "'"
    dead_acute: "´"
    A: Á
    E: É
    I: Í
    J: J́ # LATIN CAPITAL LETTER J plus COMBINING ACUTE
    O: Ó
    # …
    dead_diaeresis: {next: diaeresis_and_acute}
    Multi_key quotedbl: {next: diaeresis_and_acute}
    Udiaeresis: Ǘ # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
    # …
    dead_abovering: {next: abovering_and_acute}
    Multi_key o: {next: abovering_and_acute}
    Aring: "Ǻ" # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
    # …
diaeresis_and_acute:
  transitions:
    space: "΅" # GREEK DIALYTIKA TONOS
    U: Ǘ # LATIN CAPITAL LETTER U WITH DIAERESIS AND ACUTE
    # …
abovering_and_acute:
  transition:
    A: "Ǻ" # LATIN CAPITAL LETTER A WITH RING ABOVE AND ACUTE
    # …
cedilla:
  keysym: 
  transitions:
    space: "¸"
    c: ç
    C: Ç
    # …
compose:
  keysym: Multi_key
  transitions:
    apostrophe: {next: acute}
    comma: {next: cedilla}
    # TODO: Check how to handle thee following (overlapping with previous, because unrelated)
    #       Maybe use: `comma: {filter: __letter__, next: cedilla}` ?
    comma apostrophe: "‚" # SINGLE LOW-9 QUOTATION MARK
    comma quotedbl: "„" # DOUBLE LOW-9 QUOTATION MARK
    comma minus: "¬" # NOT SIGN
    # …
# …

X11 data

We could reuse the new format for compose files templates in the libX11 repository:

whot commented 7 months ago

A fairly high-level question: at what point does compose become an input method? And would a more complex compose be better left to an input method implementation (e.g. ibus)?

AFAICT XKB compose sequences pretty much pre-date input methods because back in the early 1990s there was little consideration of CJK languages (and others that need IM). But modern desktops enable IM by default, e.g. I always get annoyed when I have a new GNOME session and my shortcuts produce emojis instead.

So, without going into technical details I would probably argue that putting compose into IM implementations might be a more scalable approach?

wismill commented 7 months ago

@whot

A fairly high-level question: at what point does compose become an input method?

It is indeed an input method (see definition on Wikipedia). Maybe the oldest one?

And would a more complex compose be better left to an input method implementation (e.g. ibus)?

In fact it is already implemented as an input method in Gtk and Qt. But while Qt uses xkbcommon implementation underneath, Gtk decided to go its own way in ibus. I am not sure why they took this decision and if this is a definitive one. Could it be that compose support in ibus predates compose support in xkbcommon?

e.g. ibus

I really dislike ibus. I use the Plasma desktop and it does not integrate well. It is really Gnome-focused. Not mentioning that it is not efficient (cpu, memory). But I admit it does a few things better: support for overlapping Compose sequences (see #398 to implement this in xkbcommon and in Qt) and support of Ctrl+U for Unicode code points input.

So, without going into technical details I would probably argue that putting compose into IM implementations might be a more scalable approach?

I wish we could have a reference implementation of Compose machinery for all input methods frameworks. I think my proposal is not disruptive (apart the new text format): Compose feature is by essence a state machine; I would like to lift some of its current limitations.

I see the following next steps:

whot commented 7 months ago

Gtk decided to go its own way in ibus [...] Could it be that compose support in ibus predates compose support in xkbcommon?

yep, GTK compose handling pre-dates libxkbcommon by... quite a number of years :)

Is https://github.com/xkbcommon a good place to develop this hypothetical new library?

AIUI libxkbcommon is on github because at the time it was the only git forge (freedesktop was still on bugzilla + ssh-git). xkbcommon is also severely lacking developer time, so it may be better hosting this "closer" to the users to take advantage of the user set (or even freedesktop gitlab). But otherwise I don't see a reason why not to host this in this namespace.

wismill commented 7 months ago

@whot Has anyone proposed a unified implementation (parsing, state handling)? I could not find evidence after a quick check. Any advice how to start the discussion with Gtk devs?

whot commented 7 months ago

Has anyone proposed a unified implementation (parsing, state handling)?

Note that I know of but let's see if @ebassi is listening (and can answer the second question) :)

ebassi commented 7 months ago

Input methods are an area of computing and UX where people have Strong Opinions™, especially when it comes to workflow issues; for instance, you'll often hear something to the effect of "I really dislike [project]", for one reason or another.

[Ibus] is really Gnome-focused.

It's actually the other way around: about 12 years ago, GNOME picked a single input method framework for a variety of reasons, and then designed the whole thing around it, instead of just letting people choose and avoiding to commit to a specific UX. Of course, it's not without strife: ibus has its own shortcomings, mainly at the intersection of deeply entrenched workflows (see above, re: xkcd) and UI design.

Has anyone proposed a unified implementation (parsing, state handling)?

Not that I know of.

GTK has its own XCompose file parser, because it can handle compose sequences internally and people wanted to keep their custom files from 30 years ago working even without ibus; internally it's implemented as part of the "simple" input method object, which is used to handle things like Unicode and dead keys when ibus is not available, or on non-Unix platforms, like Windows and macOS.

ebassi commented 7 months ago

As a side note: GTK isn't going to drop parsing the existing XCompose files, but we're not going to add a new compose file format, especially one using YAML. If a new format is defined, support for it will have to be implemented inside ibus or inside a separate, out of tree input method module for GTK.

wismill commented 7 months ago

@ebassi thank you, this is insightful!

you'll often hear something to the effect of "I really dislike [project]", for one reason or another.

Sorry about that. It was not constructive.

GTK has its own XCompose file parser, because it can handle compose sequences internally and people wanted to keep their custom files from 30 years ago working even without ibus; internally it's implemented as part of the "simple" input method object, which is used to handle things like Unicode and dead keys when ibus is not available, or on non-Unix platforms, like Windows and macOS.

So the conclusion is probably: “if it is not broken, do not fix it”. Fair enough, let’s keep multiple implementations.

but we're not going to add a new compose file format, especially one using YAML.

What is the issue with YAML in this case? I would not mind to use another format. Have you another format in mind?

If a new format is defined, support for it will have to be implemented inside ibus or inside a separate, out of tree input method module for GTK.

So the only way to enhance Compose seems to develop a dedicated library for the new format, then build an engine for both Qt IM and iBus?

ebassi commented 7 months ago

What is the issue with YAML in this case? I would not mind to use another format. Have you another format in mind?

The issue with YAML is that we don't have a parser for it, and adding libyaml as a dependency to GTK is not going to happen.

For out of tree input methods we don't have the same restrictions, of course.

So the only way to enhance Compose seems to develop a dedicated library for the new format, then build an engine for both Qt IM and iBus?

That would be my recommendation.

wismill commented 7 months ago

For the record: Gtk ticket “Use xkbcommon-compose instead of custom compose table mechanism”.