Call for possible collaboration

kermitfrog / inputmangler

Inputmangler is a daemon that intercepts and transforms linux input events, depending on the active window. It aims to be highly configurable, reliable and not too hard to use. You can use it to remap those extra mouse buttons, properly utilize a footswitch or even remap your second keyboard to trigger all kinds of shortcuts! If you are left-handed, you can switch left and right mouse buttons for applications that ignore your Desktops settings (like dosbox and fs-uae). And you can have a different configuration for each window! It is also capable of translating Text from the network to key presses.

Other

5 stars 1 forks source link

Call for possible collaboration #2

Open kermitfrog opened 1 year ago

kermitfrog commented 1 year ago

Hi,

I'm the developer of a tool called inputMangler, which transforms input events on linux. After a few years of other priorities I want to continue development (well.. rewrite it from scratch actually..). As I like to avoid duplicate work, I had a look around the net to see if someone else started another project like mine. I found a few which at least do something similiar and, if you're mentioned at the end of this post, one of them is yours.

While all those projects seem to have more or less different goals and approaches, there still might be enough common ground for collaboration. So this thread is about exploring possibilities to work together.

In the next post, I will write an overview of my goals. I invite everyone interested to do the same.

Afterwards we can compare those and discuss if it would make sense to • put some base code in a common library • merge projects (may be unlikely, but .. maybe) • just share experience on strange input-related problems ;D

Links to the projects: https://github.com/kermitfrog/inputmangler https://github.com/sezanzeb/input-remapper https://github.com/samvel1024/kbct https://github.com/shiro/map2 https://github.com/rvaiya/keyd https://github.com/snyball/Hawck https://github.com/KarsMulder/evsieve

And the people that I hope will have a look at this after receiving a notification for being mentioned: @sezanzeb, @samvel1024, @shiro, @rvaiya, @snyball, @KarsMulder

KarsMulder commented 10 months ago

Integration with accessibility features

Here is a short sample of what third-party voice assistants can get done on Windows: https://youtu.be/G2m0kUkYHuQ?t=618 (10:18–11:24)

Notice how he uses commands like "click blank document" to make the voice assistant click whatever clickable part of the UI contains the word "blank document", or how he can adjust the text he already typed with commands like "capitalize that".

Of course, you can imagine how handy it would be for macro scripts to be able to issue commands like "click whatever button is called [...]".

We cannot accomplish integration like what was shown above with just support from the Wayland compositor; that would require cooperation from the GUI toolkits like GTK and Qt. I wonder what kind of accessibility Qt and GTK already have built in. Do they already expose it to the compositor, or would they be able to do so with another protocol extension?

I wonder if we can design the event mapper protocol to be able to take advantage of any accessibility information available, or become able to do so in the future after.

sezanzeb commented 10 months ago

"read this keymap, add a character あ corresponding to a currently unused scancode, and then serialize the result as a new XKB file"

Mapper A can associate character あ with the previously-unused scancode 250

I did that at some point, more or less: https://github.com/sezanzeb/input-remapper/blob/xkb/keymapper/injection/xkb.py

However https://github.com/xkbcommon/libxkbcommon/issues/223#issuecomment-809309962

Wayland does not implement a generic way for clients to change the keymapping; you'd have to work through environment-specific configuration API.

This does mean that each mapper should know ahead of time which keys it could possibly desire to generate

Which might be impossible with mapper scripts that allow sophisticated user-defined custom scripts

KarsMulder commented 10 months ago

According to The Wayland Book [CC-BY-SA 4.0]:

Note that the server can send a new keymap at any time, and all future key events should be interpreted in that light.

So it seems like the protocol already expects clients to be able to deal with changes in the keymap. All we need is a new protocol to tell the compositor that a new keymap should be used.

sezanzeb commented 10 months ago

"read this keymap, add a character あ corresponding to a currently unused scancode, and then serialize the result as a new XKB file"

So it seems like the protocol already expects clients to be able to deal with changes in the keymap. All we need is a new protocol to tell the compositor that a new keymap should be used.

That would be nice. Simple wayland input remapping utilities might already exist at this point if this was possible.

Regardless of how more sophisticated mapping tools can be made to work with wayland, the above might already be an improvement.

So, should we ask wayland developers to consider this? Mailing list idk?

The clients on their turn are expected to individually link to libxkbcommon to turn those scancodes into characters.

Other events like joysticks are probably not passing through libxkbcommon? Because maybe one could hook a mapping script into libxkbcommon somehow. Maybe libxkbcommon can be modified to provide something like a devnode for reading scancodes and writing characters. But it's probably synchronous and you can't really do anything funky with it (like writing multiple characters with delay), isn't it?

KarsMulder commented 10 months ago

Regardless of how more sophisticated mapping tools can be made to work with wayland, the above might already be an improvement.

So, should we ask wayland developers to consider this? Mailing list idk?

According to the original draft image, the programs that are likely to change the keymap are also the programs that are likely to change the events themselves:

The programs that map the events need to be put in some order in a consistent way, lest there is only some% chance that your whole setup works during any individual reboot. To that end, we need to figure out a way to order the event mapping programs, and that same ordering system will probably be reused for ordering the keymap changes when multiple programs want to modify the keymap.

And, of course, we still need to figure out how to authenticate the programs that are or are not allowed to map events or change the keyboard layout. Wayland devs don't want that programs running in a sandbox becomes able to keylog the whole system just because they're allowed to display stuff on screen.

The point is that the event mapping protocol and the keymap-changing protocol will probably end up both relying on some common basis. I think it is best to tackle the event-mapping problem and the keymap-changing problem at the same time, than to rush one part of the solution, only to later discover that it doesn't interact nicely with the other half of the solution.

In case we end up giving up on finding an event-mapper protocol, then it may be a good time to propose an independent keymap-changing protocol.

KarsMulder commented 10 months ago

Maybe libxkbcommon can be modified to provide something like a devnode for reading scancodes and writing characters. But it's probably synchronous and you can't really do anything funky with it (like writing multiple characters with delay), isn't it?

libxkbcommon's API involves the programmer passing individual scancodes to functions like xkb_state_key_get_one_sym or xkb_state_key_get_utf8, and those functions return whatever corresponds to that specific scancode given the state of the keyboard. I don't think we should mess those functions.

kermitfrog commented 10 months ago

Wayland (..) would have to recognize that a unicode character is included in the event payload and use that, instead of looking into the keymap.

Considering the issue with special characters and application-preset-switching, I doubt that a solution that only works via evdev will be really satisfying for everyone. Would it make sense to open an issue on their repo and ask them what they think? https://gitlab.freedesktop.org/wayland/wayland

Mapping to keys not on the keyboard

I think the big question is where to inject unicode.. kernel-level makes no sense to me; XKB-level/libinput might maybe work. Some things to think about:

when writing applications with Qt, I can also get the kernel-level keycode of the event.. what keycode would I get for a unicode character?
will such events cause problems for some applications?
how to deal with unicode extended grapheme clusters? (https://tonsky.me/blog/unicode/)

I think what we maybe should do first, is to find out how IMEs work and whether we can use that to write arbitary characters.

Long story short, it seems like we need to escalate this no matter what, and I think the kernel is the place to go, because Wayland is definitely not.

I agree on kernel vs. wayland, but after reading all the new stuff by @KarsMulder, I wonder if we might need to contact the libinput developers as well.

Wayland protocol-level mapping vs Input Daemon

& Getting a new Kernel API would certainly solve issues regarding to two evdev mapper scripts working together, but there are some problems that we're simply not getting solved on an evdev level: [..] And probably some more. Frankly, if we intend to keep our Wayland keymappers working on evdev level, we will never get close to the amount of input customization that Windows offers.

As far as I understand, the basic input stuff on wayland is handled by libinput. Compositors are free to implement some more complicated stuff - but that would lead to fragmentation and because of that is probably better left to another library or the end-user applications. Otherwise some stuff will only work on certain DEs.

There is one big exception though: libinput does not handle joysticks. Games (and possibly other applications) usually just grab joysticks, bypassing libinput/Wayland.

For our purposes, I think libinput is a much better layer to put a mapping API than wayland. It also reads directly from evdev, so it might be the one place where pretty much anything an input mapper would do comes together. It is supposed to be boring and leave complicated stuff to other programs by design, which might make the devs resist the addition of the protocol we need.. but I believe we have good arguments to justify it..

Integration with accessibility features

[..] I wonder what kind of accessibility Qt and GTK already have built in. Do they already expose it to the compositor, or would they be able to do so with another protocol extension? & I wonder if we can design the event mapper protocol to be able to take advantage of any accessibility information available, or become able to do so in the future after.

I tried to find out how this works and so far understand that there is a protocol called AT-SPI2 (https://www.freedesktop.org/wiki/Accessibility/) which seems to be implemented by Qt , GTK and possibly other toolkits.

When an application starts while some accessibility program is running (not sure how this is checked), that application exports a DBUS interface which exposes metainformation about the GUI as well as some ways to interact with it. See here (https://doc.qt.io/qt-6/qml-qtquick-accessible.html#details) for some information on how it's done in Qt.

I don't think the compositor has anything to do with it.

pallaswept commented 10 months ago

Just wanted to say that if I seem quiet it's not because I'm ignoring all this, it's because you guys are like, light years ahead of me on this, and I'm kinda following along behind you. I really appreciate all the effort you're putting in and sharing your experience and know-how on this. If I'm quiet, it's not because I'm ignoring all that you're giving, it's because I'm standing in awe and appreciation <3

KarsMulder commented 10 months ago

The current Input Method Editor protocols

It appears that fcitx5 uses the zwp_input_method_context protocol to communicate with the Wayland compositor. The Wayland compositor then allows it to take control of textfields in client applications that use the zwp_text_input protocol.

If all we cared about was mapping text input (which is not the case), then the zwp_text_input protocol does offer some nice features:

It allows you to read and change the content of the text fields; (You know, the thing IME's are supposed to do.)
It allows you to send a grab_keyboard request to block the client from receiving keyboard keys;
It allows you to send a keysym request to directly send an XKB keysym, which can be an Unicode codepoint and does not undergo any transformation based on the active keymap.

As far as keyboard mapping goes, this sounds good so far. Now we get to the bad parts:

The protocol does not appear to be designed for composability with multiple IME's. My compositor (kwin) asks you to choose one, and only one, IME in the configuration menu;
It only works if a text field is in focus. It is not the kind of thing that would be useful for things like "press down to scroll the webpage down";

And last but not least: as you can see in the protocol of zwp_input_method_context_v1, there is a keysym request. However, if you search through the corresponding zwp_text_input_v3 protocol, you may notice a distinctive lack of a corresponding keysym event that would notice the client that the IME send a keysym request. So what actually happens to the keysym requests that the IME sends?

It turns out zwp_text_input_v1 and zwp_text_input_v2 used to have a keysym event, but that event got removed from zwp_text_input_v3. After digging through the source code of Kwin, it seems that when zwp_text_input_v3 is in use, then the compositor will "convert" all keysyms received from zwp_input_method_context_v1::keysym back to scancodes ("like this"), and then forward that scancode to the application. Scancodes are again subject to the XKB keymap, so we lost the ability to send keysyms that are not on the active keymap.

It seems that zwp_text_input_v3 originates from gtk_text_input [source], but I haven't been able to find anything about the related discussion that went into it beyond being mentioned in this short article. Anyway, I imagine that the keysym event got removed because receiving keysyms directly does not interface nicely with the rest of libxkbcommon and hence is a pain to deal with for the client.

KarsMulder commented 10 months ago

There is one big exception though: libinput does not handle joysticks. Games (and possibly other applications) usually just grab joysticks, bypassing libinput/Wayland.

I wonder how games read those joysticks. You generally need to be root or member of the input group in order to directly read event devices. Do games (or Proton?) escalate to root privileges just to be able to read joysticks?

For our purposes, I think libinput is a much better layer to put a mapping API than wayland. It also reads directly from evdev, so it might be the one place where pretty much anything an input mapper would do comes together.

I think that it is kinda unfortunate that Wayland simultaneously handles display and input. Why should it be the display server's job to decide what input reaches the applications? Why can't the user be free to choose their display and input server separately? If only that libinput part that's baked into all Wayland compositors became dynamically swapable...

(Without LD_PRELOAD please.)

But that's pretty much the "Input Daemon" suggestion I made, and that has its drawback too.

(But I still sometimes feel like brazenly suggesting a protocol that basically says "The compositor is no longer in charge of the wl_seat global, all requests to it must be relayed to another application and all events from it will come from another application. With some adjustments the idea might not even be as insane as it sounds at first.)

Anyway, even if we did decide that libinput got extended to work nicely with keymapping, we would still need to figure out the protocol that multiple applications could use to simultaneously keymap. And if we had such a protocol, we could think about why it shouldn't just be a Wayland extension protocol.

(Also, is libinput even aware to which window its input goes? I got sidetracked by the IME's and still haven't gotten to the bottom of that.)

kermitfrog commented 10 months ago

As far as keyboard mapping goes, this sounds good so far. Now we get to the bad parts:

It only works if a text field is in focus. It is not the kind of thing that would be useful for things like "press down to scroll the webpage down";

I only thought about using IME for injecting unicode text that is not in the keymap anyway, so that's not really a problem (I really can't think of a use case for that other than writing text).

The protocol does not appear to be designed for composability with multiple IME's. My compositor (kwin) asks you to choose one, and only one, IME in the configuration menu;

This one is :(. Maybe there is a way to put an injection layer in between the IME and the compositor? Otherwise we would need to extend the protocol to use IME for unicode injection.

There is one big exception though: libinput does not handle joysticks. Games (and possibly other applications) usually just grab joysticks, bypassing libinput/Wayland.

I wonder how games read those joysticks. You generally need to be root or member of the input group in order to directly read event devices. Do games (or Proton?) escalate to root privileges just to be able to read joysticks?

No, its simpler: device nodes handling a joystick get different permissions.

crw-rw----  1 root input        13, 81 Nov 24 09:08 event17  <-- Mouse
crw-rw----+ 1 root input        13, 82 Nov 24 09:23 event18  <-- PS4 Controller joystick part
crw-rw----  1 root input        13, 82 Nov 24 09:23 event19  <-- PS4 Controller Motion Sensors
crw-rw----  1 root input        13, 83 Nov 24 09:23 event20  <-- PS4 Controller Touchpad
crw-rw-r--+ 1 root input        13,  0 Nov 24 09:23 js0      <-- PS4 Controller joystick (old?) protocol

getfacl event18 prints:

# file: event18
# owner: root
# group: input
user::rw-
user:arek:rw-
group::rw-
mask::rw-
other::---

When switching to a different user (without logging out the first), the username gets changed to that one (user:tmp:rw-), so I guess polkit or something similiar is involved.

Also, if you're wondering: grabbing event18 will block events at js0.

Anyway, even if we did decide that libinput got extended to work nicely with keymapping, we would still need to figure out the protocol that multiple applications could use to simultaneously keymap. And if we had such a protocol, we could think about why it shouldn't just be a Wayland extension protocol.

Yeah, maybe we should focus on defining the protocol first..

(Also, is libinput even aware to which window its input goes? I got sidetracked by the IME's and still haven't gotten to the bottom of that.)

I'm pretty sure it is not. That's the thing we really need the compositor to provide and it would be great if we could make it part of the wayland core protocol. Until then, kwin might be the only choice for people who need this.

sezanzeb commented 10 months ago

I only thought about using IME for injecting unicode text that is not in the keymap anyway, so that's not really a problem (I really can't think of a use case for that other than writing text).

I remember that sometimes applications have keyboard shortcuts that use special characters, which aren't accessible on my german layout without using modifiers.

KarsMulder commented 10 months ago

After looking at libinput some more, it does seem to have some seriously useful features such as such as button debouncing and palm detection, which filter out events sent by the hardware that were never intended by the user. You generally want your keymapping scripts to skip over those as well.

If we were to map after libinput, then we run into the problem that libinput merges all input devices into seats, where all similar devices get merged together into one device. This would make it impossible to apply different mappings to different keyboards, which is a use case that is sufficiently real that I'm doing it right now.

However, taking a closer look at the libinput source code, the situation may not that bad: libinput does report for each event from which device it originates (libinput_event_get_device), and as far as I can see, it does generate multiple KEY_DOWN events if the same key is pressed on multiple keyboards, it just also sends a seat_button_count along with each event, telling you how often that particular key has been pressed across all devices belonging to that seat.

However, if we add our mappers after libinput, then we do have to (?) map libinput events. The problem with mapping libinput events is that they're kind of unwieldy. For example, this is the libinput event for pointer events:

// Code taken from libinput. Copyright © 2013 Jonas Ådahl, © 2013-2018 Red Hat, Inc.
// Licensed under MIT. See the header of the original file for the full license:
// https://gitlab.freedesktop.org/libinput/libinput/-/blob/b600cc35c5b001cbc6685d4d95ce2f3d36fb3ae4/src/libinput.c

struct libinput_event_pointer {
    struct libinput_event base;
    uint64_t time;
    struct normalized_coords delta;
    struct device_float_coords delta_raw;
    struct device_coords absolute;
    struct discrete_coords discrete;
    struct wheel_v120 v120;
    uint32_t button;
    uint32_t seat_button_count;
    enum libinput_button_state state;
    enum libinput_pointer_axis_source source;
    uint32_t axes;
};

That's quite a lot more than what Wayland reports to applications. Some of it is redundant, like the same coords in different coordinate formats; a mapper script would have to take care of modifying all of them at once. It also contains a painful seat_button_count, which tells you how often a particular key or button has been pressed across all devices assigned to a seat. If you were to map only one device, you'd mess up the seat_button_count on all other devices. And last but not least, I feel like this kind of event leaks too many implementation details to be a good candidate for standardization.

The ideal solution would involve rewriting libinput with a more modular architecture where the various features it provides are implemented as different layers, and where third party modules can be inserted in the middle of the processing chain (e.g. after filtering out palms, before gesture detection and before the coordinates are formatted in a bazillion different ways), but I have my doubts that we can get the original libinput developers to go along with that plan.

The Wayland protocol does not send the entirety of the libinput events to applications either. Maybe we can get away with simplifying the event format after it leaves libinput? [Edit: this sentence is false, Wayland does send the approximate entirety of the libinput events to applications.]

pallaswept commented 10 months ago

to apply different mappings to different keyboards, which is a use case that is sufficiently real that I'm doing it right now.

FWIW, this is a thing disabled users need, too. Definitely a real use-case.

kermitfrog commented 10 months ago

I remember that sometimes applications have keyboard shortcuts that use special characters, which aren't accessible on my german layout without using modifiers.

Now that you mention it, it seems soooo obvious... As someone who uses the programmer dvorak layout, I have often run into programms (mostly games) which expect me to press things like '1', '2' or '3' without a modifier and it's a real pain in the a.. :(

Although I have a rough plan on how I can avoid most of these issues in the future, it would be great to have a proper solution that does not involve editing xkb layouts.

But considering how many different approaches to handling keys there seem to be in different programms / frameworks, I am very doubtsful about how much events with unicode codepoints will be able to help with this mess. But it's still an approach worth thinking about..

to apply different mappings to different keyboards, which is a use case that is sufficiently real that I'm doing it right now.

FWIW, this is a thing disabled users need, too. Definitely a real use-case.

I think multiple keyboards might not be uncommon among users who use event mapping. I myself have a keyboard, a footswitch and a keypad, all of which register as a keyboard.

However, if we add our mappers after libinput, then we do have to (?) map libinput events. The problem with mapping libinput events is that they're kind of unwieldy. For example, this is the libinput event for pointer events:

I would not have thought the output struct is that big O.O

The ideal solution would involve rewriting libinput with a more modular architecture where the various features it provides are implemented as different layers, and where third party modules can be inserted in the middle of the processing chain (e.g. after filtering out palms, before gesture detection and before the coordinates are formatted in a bazillion different ways), but I have my doubts that we can get the original libinput developers to go along with that plan.

Yes, I believe that would be best, too. And share your doubts as well :/.

But what are the alternatives (at least if we want a full-parts-mapping-protocol)? Some ideas:

Fork libinput and hope that wayland makes it easy to choose which one is used.
Remap at different levels, meaning pre and post libinput.
Maybe libinput-devs would agree to make things like palm detection callable via API. Then we could call these for preprocessing as needed. I'm not sure how much this would help though..

Maybe we should start by compiling a list of features we need from libinput.. I need to think about this some more..

pallaswept commented 10 months ago

I have my doubts that we can get (people) to go along with The ideal solution....

When I first read this I wrote and then deleted a few angry responses.

Nobody can be forced to help, but nobody should be allowed to stand in the way of fixing this. If somebody prevents fixing this, they are as much a cause of the problem, and their removal from the system is as much a part of the solution, as any code, protocol, or design concept.

I can't stand it when people fork or build alternatives, rather than improving existing solutions; it usually just creates a mess and makes it harder for end users to have a coherent system, usually they end up having to choose between two incomplete solutions.... I really dislike forks in general.... but if one is not allowed to improve existing solutions, one has little choice but to build an alternative, be it from a fork or from scratch.

I like to hope that the devs of any project which would be involved, will recognise any shortcomings in their implementation and not only be willing to take contributions, but also to assist in contributing themselves. I mean, if you built a thing, you'd surely want it to be the best thing it could be, and not have giant problems that make the entire operating system unusable for a significant percentage of human beings. I would like to remain optimistic that the libinput devs would take all of this on board with a positive response.

If it's just some random crippled greybeard retired dev and a small handful of FOSS-enthusiast disabled folk having a cry about it, while all their friends, fellow cripples and demanding high end gamers alike, joke about what a nerd they are and just use Windows or iOS, then I can see it going nowhere - because that's what's happened so far!

However, with knowledgeable and experienced input (pardon the pun) from experts, which moves from just having problems towards building solutions, like you all are contributing, I think this thread amounts to the beginnings of a very convincing proposal to improve existing solutions, and I like to think (hope...pray.......) that the devs of whatever project might need enhancements, would take it seriously and view it as constructive, and not be defensive about it.

kermitfrog commented 10 months ago

[..] I think this thread amounts to the beginnings of a very convincing proposal to improve existing solutions, and I like to think (hope...pray.......) that the devs of whatever project might need enhancements, would take it seriously and view it as constructive, and not be defensive about it.

From the libinput docs:

What libinput is not [..] There are plenty of use-cases to provide niche features, but libinput is not the place to support these. [..] libinput is boring. It does not intend to break new grounds on how devices are handled. [..]

I think these are the descriptions that make us sceptical about acceptance of a big change in libinput. But you are right: what we are preparing here is constructive and needed for various reasons and we shouldn't make the mistake of letting (possibly unwarrented) worries of rejection slow us down. The libinput devs might just as well embrace the new stuff, or at least participate in an alternative solution. In any case it won't hurt to ask!

I think the next steps should be:

Collect what needs to be changed -- I started a new issue for this: #3
Write rough proposal
Send it to the libinput devs

I probably won't have enough time for this before wednesday (or friday) though.

[..] Back in windows-land, it's not even a bat of an eyelid to be running 5 or 6 input handling tools like this simultaneously. Nobody talks about it because it's normal. [..]

Out of curiosity I looked at the windows input API docs. From maybe an hour of reading, this is what I understand:

There seems to be only one input stream which only distinguishes devices between Mouse, Keyboard and other. There is one call BlockInput, which seems to block all keyboard and mouse input from reaching other applications. The thread that used this to block input, can then still get physical events inject new events into the input stream.

Also: keyboard events can carry unicode characters (16-bit, I'm sure it means UCS-2 encoding). If that happens, it generates a virtual event (I think).

But I really don't understand how multiple applications are supposed to work together - from what I read so far I'd expect the situation to be a lot worse than on linux. I'm most likely missing some knowledge about how the input stream works.

pallaswept commented 10 months ago

I think these are the descriptions that make us sceptical about acceptance of a big change in libinput

Yeh I kinda honestly feel like there's a strong likelihood they'll vehemently "nope" this, on the spot. Then again, I've heard a bit recently about them adding support for IIO (as in, Industrial IO; accelerometers, light sensors, weird input devices) and there's the very closely related libei they've recently added to their stack, so ...mehhh I dunno I have strong doubts for the same reasons you mentioned, but kinda also feel like maybe they'll be really feeling all this and might just get involved. I really wish some of those devs were in this thread right now. I feel like even if they "nope" it for their own project, they'd tell us how to make it happen in some other way. Even in the worst case scenario, they say "hell no, and the only way it would happen is if we say yes, so you'll have to fork libinput, now stop wasting our time and don't talk about it any more" at least we know what's in front of us. It feels like there's a pool of knowledge among those devs, that we're missing out on....so

In any case it won't hurt to ask!

Yeh! :slightly_smiling_face: I feel like getting their input is definitely on the cards. Thanks so much for getting the ball rolling on that one. I'm glad you started a nice new clean issue for it too.

Also I might just tag @MerlijnWajer here, who hasn't updated uinput-mapper in a decade but was very early in this game and might have some interesting thoughts here. Sorry if @'ing you was an annoyance, Merlijn! I just thought you could be a valuable player here :)

KarsMulder commented 10 months ago

I've been thinking about the new Wayland protocol and posted my current (incomplete) draft in a new issue: #4

I've got a good feeling about this one, but there's still quite some work that I need to do. It is neither fully implemented nor fully documented yet, some parts of the current spec are broken, et cetera. Anyway, I just wanted to post my current progress to show that something is getting done.

KarsMulder commented 10 months ago

libei

This is pretty big.

It is basically an API for creating virtual input devices. Combined with our ability to just grab all input devices, we basically have the necessary API's for creating an "input daemon" as I mentioned earlier.

While an input daemon is not the perfect solution, it does provide a big possibility: suppose we create some Wayland protocol and write a library that implements it, but compositors are reluctant to implement it. Then we could write a daemon which grabs all event devices, processes those events through libinput and our library, and then makes the resulting events available through libei.

Mapper scripts could then check if the compositor natively supports the protocol, and if the compositor doesn't but does support libeis, start the daemon as fallback.

The daemon approach still has disadvantages such as requiring another process to run, another program to install, would make all devices show up as "virtual devices", prevents other applications like Qemu from grabbing the evdev devices, may not be able to change the keymap, may not be able to perfectly switch the active mapper based on the active window, et cetera. But it could provide a somewhat suitable fallback for users who are stuck on a compositor that does not support the new protocol but does contain libeis.

kermitfrog commented 10 months ago

libei

This is pretty big.

Yes, it could be useful. In addition to creating the daemon, it offers possibilities to directly transform evdev-level events to post-libinput level events. I wonder if there are good use cases for that..

pallaswept commented 8 months ago

I hope nobody minds, but I came across a related thread, where the above issues were discussed (well, brought up but not discussed much), so I linked this thread in the hopes that some of the (very important, respected, and capable) individuals there might perhaps weigh in on the conversation. The thread is over here https://discuss.kde.org/t/new-ideas-using-wayland-input-methods/3444/19. I just thought I should let this end of the conversation know that I'd linked it. Again, I hope this is OK, apologies if I've done the wrong thing.... Just.... a lot of the people in that thread are a pretty big deal and they're all working in this field at a fairly high level.

kermitfrog commented 7 months ago

I'm back! Well.. at least I should have some capacity for input stuff again :)

One of the time-consuming things I did the last weeks was to switch my keyboard to a Dygma Defy. This made me re-evaluate how I use my keyboard and I ended up modifying my layout (xkb-wise) as well. This gave me an idea for a (partial) workaround to the type-arbitary-unicode-symbols problem.

Let's start with a few often overlooked facts about xkb:

You can map any keycode that a keyboard could send. (in contrast to windows, which limits you to the visible-characters-without-numpad part of a 105-key keyboard)
It features key composition, which can be used to type characters that are not directly found on the layout.
- + Composition does include dead keys as well as the Compose key (aka Multi-Key)
- + The Multi-Key is a mappable key itself, which should mean that we can place it on any key.
- + It seems that you can use input characters that require a modifier.
- + Compositions are defined system-wide under /usr/share/X11/locale/*/Compose. Users can have their own compositions in ~/.XCompose
- - changing these settings might require a re-login
- - typing arbitary unicode values is not a xkb/compose feature! Mentions on the internet of doing that with Ctrl+Shift+U seem to rely on ibus being set as input method.
- - this will probably not work in every application.

Now one sometimes forgotten fact about input remapping via uinput:

The keys supported on the virtual keyboard (output) are not limited by the keys supported by any real input devices.

That means: as long as we know which unicode characters can possibly occurr (defined by user configuration) and the mapper is aware of the current layout (I already wrote some working proof-of-concept code for this last year), we can:

Find the Multi-Key on the current layout or help the user to map it to some extra key that is only present on the virtual keyboard.
Generate compositions for all missing characters.
Trigger the necessary key combinations.

pallaswept commented 7 months ago

Good to see you back, Kermit :) And Hi,all! This tab stays open in my browser for the time being... I think we're on a long road, here....

I came across a reddit post about this article, entitled, "Input method on Wayland is broken and it's my fault" which rang some bells here. The reddit thread mentioned ibus-typing-booster which I've tried out lately, and it's promising on a few fronts discussed above, but rather bug-prone at the moment. I leave it installed on my machine, in hope, but presently, it remains disabled.

Thought I might share the article in case it might be food for thought, or perhaps bring a 'new recruit' to this issue :laughing: At least maybe if the author were to see this thread, they would not feel quite so much the lone bearer of fault in this situation... I don't think it's anyone's fault really. We are in need of a hero, or ten :wink: Do you think we should maybe send them a message?

KarsMulder commented 7 months ago

Thought I might share the article in case it might be food for thought, or perhaps bring a 'new recruit' to this issue 😆 At least maybe if the author were to see this thread, they would not feel quite so much the lone bearer of fault in this situation... I don't think it's anyone's fault really. We are in need of a hero, or ten 😉 Do you think we should maybe send them a message?

It seems we've found a real expert here. According one of their other articles, we're talking about the person who designed all the Wayland extension protocols around input methods. Feel free to message them.

To make visiting this thread maybe worth their time, here are some of my thoughts about "Mistake 2: bad synchronization" mentioned in the linked article.

(I am not actually sure if I understood the problem correctly. Does "commit" mean that the preedit string is to be turned into definitive content of the text box? If yes, the why does the second preedit string "Mo" still contain the "M" which should've become permanent content already? If no, why did the "M" character get reported as content of the text box due to lag? Anyway, here are my thoughts for as far as I think I understand the article.)

I get the impression that the fundamental problem is that the IME does not know which of its proposed changes were accepted and which were rejected. If it does not resend its changes when they do not show up, it is possbile that input gets lost when a web document is edited by somebody else. When the IME does aggresively resend any change that does not observe as having shown up in the text box, then there is probably a whole other can of bugs about to spring open.

If changing the protocol is still on the table (the protocol is still unstable after all), then I think this could be solved by making the "commit" message include from which state to which state is transformed, which makes it possible for the input method to figure out which of its actions were discarded.

Both the application and IME start at state 0. When either of them wants to change the content of the textbox, they must include both the old state number and the new state number in part of the commit message. The IME always uses even numbers for states that it creates, whereas the application always uses odd numbers for states that they create, to avoid clashing state numbers.

So, typing "Mo" would result in the following exchange of messages:

no_contest

All of these states were created upon initiative of the IME, so they all use even numbers. The application acknowledges each state transition explicitly, so the IME knows that all of its keys were accepted.

Now let's consider the laggy situation where the user is trying to type "Mo", but a collaborator on a web document types an "a" while the IME is still busy composing:

contested_commit

While the IME was trying to compose "Mo", the application received some TCP packet telling it to insert an "a" key after it read the "M" key from the IME but before it read the "o" key. From the application's perspective, two state transitions have happened:

0 → 2, the transition that added the "M" due to user input
2 → 3, the transition that added the "a" by a collaborator across the internet

At this point, the application is in state 3. It then receives a request from the IME to transition from state 2 to 4, but the application rejects it because it is not in state 2. The application informs the IME that it has observed two state transitions: 0 → 2 → 3.

The IME sees that the transition 0 → 2 was acknowledged by the application and thus the "M" key was accepted, but it has also sent a request to transition "2 → 4". Because the application moved from "2 → 3" instead, the IME knows that its second request has been or will be rejected, and thus that the application has not received the "o" key.

It knowing that the "o" key has been rejected, it then tries to play back all rejected requests, but this time based on the last state reported by the application. The user tried to type "Mo", the text now shows "Ma". If the "o" key went through, the text would be showing "Moa", so it sends a new request to transition 3 → 6 and change the text to "Moa".

A minor thought: maybe the state should not be assumed to start at 0, but instead at a value declared by the compositor. Furthermore, the split between the state numbers allocable by the IME/the application should maybe not be even/odd, but "within a range that is allocated by the compositor". This could maybe make it possible to use multiple IME's at once if they are all allocated distinct ranges, but that's another can of worms I haven't fully thought through.

kermitfrog commented 5 months ago

I read through the messages we wrote here and got inspired with a new idea. For now I call it UInput Orchestrator (UIO). In short, it is a daemon that manages connections between mappers by creating and assigning multiple uinput devices, but could be extended to something else later.

It is not meant as the final solution, but rather an extensible starting point that we could implement without changing anything in evdev, libinput or wayland.

As this will likely be a longer topic, I created a new issue here: #5

pallaswept commented 4 months ago

Saw some news today about the KDE Plasma 6.1 release and they mentioned the "Input Capture Portal" which immediately captured (heh) my attention. Apparently its intended use-case is allowing software which shares keyboard/mouse between PC's, but perhaps we might find some way to use it to get our keyboards working locally?

Left this here in case UIO is not the final direction (although, it looks like it might be!)