Call for possible collaboration

kermitfrog / inputmangler

Inputmangler is a daemon that intercepts and transforms linux input events, depending on the active window. It aims to be highly configurable, reliable and not too hard to use. You can use it to remap those extra mouse buttons, properly utilize a footswitch or even remap your second keyboard to trigger all kinds of shortcuts! If you are left-handed, you can switch left and right mouse buttons for applications that ignore your Desktops settings (like dosbox and fs-uae). And you can have a different configuration for each window! It is also capable of translating Text from the network to key presses.

Other

5 stars 1 forks source link

Call for possible collaboration #2

Open kermitfrog opened 1 year ago

kermitfrog commented 1 year ago

Hi,

I'm the developer of a tool called inputMangler, which transforms input events on linux. After a few years of other priorities I want to continue development (well.. rewrite it from scratch actually..). As I like to avoid duplicate work, I had a look around the net to see if someone else started another project like mine. I found a few which at least do something similiar and, if you're mentioned at the end of this post, one of them is yours.

While all those projects seem to have more or less different goals and approaches, there still might be enough common ground for collaboration. So this thread is about exploring possibilities to work together.

In the next post, I will write an overview of my goals. I invite everyone interested to do the same.

Afterwards we can compare those and discuss if it would make sense to • put some base code in a common library • merge projects (may be unlikely, but .. maybe) • just share experience on strange input-related problems ;D

Links to the projects: https://github.com/kermitfrog/inputmangler https://github.com/sezanzeb/input-remapper https://github.com/samvel1024/kbct https://github.com/shiro/map2 https://github.com/rvaiya/keyd https://github.com/snyball/Hawck https://github.com/KarsMulder/evsieve

And the people that I hope will have a look at this after receiving a notification for being mentioned: @sezanzeb, @samvel1024, @shiro, @rvaiya, @snyball, @KarsMulder

kermitfrog commented 1 year ago

Goal overview of inputmangler

Working in current version

direct remapping of linux input events (keys, mouse wheel, etc.) for multiple devices
hierarchy of presets, which activate depending on window class and window title (no need for manual change of presets)
support for triggering combos (e.g. shift-a), macros, mouse wheel acceleration
dbus interface (change preset, update config, create event, print current window information)
localized key names (kind of - they're actually custom configured)

Planned

Easy to use UI
system service which uses something like policykit to detemine current user
plugins

Nice to have if feasable

handle more complex input like key combinations
possibility to run external commands or sending dbus events on certain conditions
config format that can easily be edited manually. This was a priority before and is working in the current version, but I would change it in favor of a cleaner format.

Frame

linux first
focus on reliability, performance and security
- rewrite in Rust
- experimental features that are in conflict with above requirements:
- have to be explicitly allowed in service config
- have to be explicitly activated by the user
- trigger a warning in UI if activated

sezanzeb commented 1 year ago

Have you considered contributing to the existing ones?

I'll probably invest my time into maintaining the 2.0 (beta right now) release coming in February

kermitfrog commented 1 year ago

Have you considered contributing to the existing ones?

I am still considering that and its part of what this thread is about. We all have certain ideas about what the program should do and how best to achieve that. So the question is: is there a project where I can realize my goals by contributing (this would clearly be the preferred way) or are the projects goals incompatible with mine?

Project descriptions tell something about what the current state is, but little about what is planned.

Having everyone summarize their goals and priorities would help a lot to clear this up.

sezanzeb commented 1 year ago

For input-remapper the current goal is to finish 2.0 up. The current work on that is happening on the beta branch: https://github.com/sezanzeb/input-remapper/tree/beta. After that pretty much any input can be mapped to anything. For example mouse movements to joysticks. It will feature an overhaul of the GUI to support all that without editing configs. After the release, people might discover bugs, since a lot of new stuff will be released.

See https://github.com/sezanzeb/input-remapper/issues/177 for information about contributing, and https://github.com/sezanzeb/input-remapper/blob/beta/readme/development.md for some technical details.

Works:

direct remapping of linux input events (keys, mouse wheel, etc.) for multiple devices

support for triggering combos (e.g. shift-a), macros, mouse wheel acceleration

dbus interface (change preset

system service which uses something like policykit to detemine current user

handle more complex input like key combinations

config format that can easily be edited manually. This was a priority before and is working in the current version, but I would change it in favor of a cleaner format.

reliability

If input-remapper doesn't work, then it is usually because something is fundamentally broken or impossible as of now. But it seems to be quite stable during operation. There are tons of automated tests.

Easy to use UI

I like to think it is

linux first

input-remapper will never support anything other than linux probably

Somewhat works:

hierarchy of presets, which activate depending on window class and window title (no need for manual change of

Via third party software: https://github.com/DreadPirateLynx/input-remapper-xautopresets. This needs to be individual for each Desktop Environment. There is no solution that works for all Wayland DEs. It's easy to do in X11 apparently.

localized key names (kind of - they're actually custom configured)

This works for X11, gnome on wayland, plasma on wayland, but other DEs that run on wayland may not support it properly. Input-remapper has to rely on using xmodmap -pke to get that information, which is part of xorg. Right now this is not causing trouble.

performance

Not causing any issues, but CPU usage can go up to 5% on my computer during usage (on a single core). input-remapper-service has never been profiled properly, there might be potential for optimization.

security

Key logging is possible or a few minutes while the GUI is open. There is no way around that. Because information has to go from a privileged service to the unprivileged GUI via a pipe to record input. Other than that, I don't think input-remapper is leaking input anywhere during normal operation.

Doesn't work:

plugins

possibility to run external commands or sending dbus events on certain conditions

Because the daemon runs as root, which is a security problem if mappings trigger commands and challenging to sandbox properly to not cause problems. I'd like to avoid those things. Running external commands is often possible via the DEs settings, and probably sufficient for most users.

dbus interface update config, create event, print current window information)

updating config is done via the GUI, which just writes to a json file

kermitfrog commented 1 year ago

Thanks for the info :)

Easy to use UI

I like to think it is

I tried your beta and think the UI has a solid concept and is well done (although some polish is needed - which is to be expected in beta). However, it follows a different logic than I would like (Device > Preset > Mapping vs. Preset > Device > Mapping).

I find defining output combos difficult though. Autocompletion is a great idea, but recording the output sequence should be better in most cases.

Not causing any issues, but CPU usage can go up to 5% on my computer during usage (on a single core).

For comparison, I tried it on my computer and it goes up to 9 % for mouse movement and 12 % with a SpaceMouse compared to 1.3 % / 2 % with inputmangler, so there is clearly room for imrovement. Are you using Cython?

sezanzeb commented 1 year ago

Are you using Cython?

Yes, not much difference with pypy once the jit compilation has started optimize it

I find defining output combos difficult though. Autocompletion is a great idea, but recording the output sequence should be better in most cases.

Have you seen the information on the botton of the output editor?

It was added for the purpose. If there is no device available to record the output the user wants it might get difficult to set certain mappings.

sezanzeb commented 1 year ago

although some polish is needed

You are very welcome to tweak it in Glade and to make a PR

Device > Preset > Mapping

and also to create a new issue to discuss this. Showing how the GUI would have to change, and explaining how the workflow of recording input would change would be helpful there :)

KarsMulder commented 1 year ago

My two cents: I think that the hardest part of a keymapper project is actually not the implementation, but the design.

If the user wanted to have full control over how input maps to output, then there is already python-evdev for that. The disadvantage of python-evdev is that it requires some boilerplate and it is difficult to write scripts that do not suffer from many different edge cases.

(Particularly, before I started on evsieve, I had about two dozen python scripts for different things, and regularly observed that writing a script that did thing A was relatively simple, writing a script that did thing B was relatively simple, but writing one that did both A and B was really difficult due to edge cases introduced by their interaction. Relatedly, the big time sink for adding new features to evsieve is not figuring out some way to implement it, but deciding on how that feature should interact with the other features that are already there, and figuring out what would be the most sensible behaviour for every edge case that could come up.)

Several projects have started to search for a higher-level way to describe ways to transform events. These higher-level configurations tend to make simple things easier but difficult things harder or impossible. The big question is how flexible you want your configuration language too be: if your configuration is too simple, many things users might want become impossible. If it is too complex, it ceases to offer much advantage over just writing a python-evdev script.

Many different projects have struck the balance between simplicity of configuration and versatiliy at different points. Before you start working on implementation details, I think it is important to first of all figure out exactly which kinds of transformations you intend to support and how you intend to present that in an user-friendly way to the user.

Since we can't beat python-evdev on versatiliy, we need to beat it on ease-of-use and user-friendliness. Having a user-friendly user interface for your targeted level of versatiliy is where the value of keymapping programs lies.

In particular,

Nice to have if feasable

handle more complex input like key combinations

I think that whether, how, and which "complex input" you intend to support—along with how you intend to present that configurability to the user—is a fundamental question that needs to be considered before anything else, rather than treated as an afterthought. The answer to this question will impact just about every other part of the development process.

sezanzeb commented 1 year ago

In our case it would be the "mapping handler" architecture, which is like a pipeline, combined out of multiple handlers that can do different things. As far as we are know it is finished on beta. We'll have to wait and see if someone raises issues about certain things not being possible. It allows for example to combine mouse-movements with button clicks to produce some other output.

kermitfrog commented 1 year ago

@sezanzeb

although some polish is needed

You are very welcome to tweak it in Glade and to make a PR

Learning GTK / Glade is out of scope for me, but I'll create issues so you know what I mean.

Device > Preset > Mapping

and also to create a new issue to discuss this. Showing how the GUI would have to change, and explaining how the workflow of recording input would change would be helpful there :)

My plans for the UI are incomplete, but it will involve a TreeView to represent the hierarchy (group -> [subgroubs ..] -> window -> title) while mappings for all devices for that preset should be visible in the same view. There is also the matter of "can be made to work" vs. "works well". If I were to transfer my current inputmangler configuration to input-remapper (using xautopresets) I'll probably end up with the configuration spread over more than a hundred files. Also in inputmangler, mappings can be inherited - I don't think that can be currently represented in input-remapper. So the config format would have to change to make it work well. I have to think a bit more about it..

Speaking of UI - I'm currently learning QML/Kirigami and have plenty of experience with the rest of Qt (mostly in C++ though), so I might help a bit with your Qt port. More by answering questions though, as it's not a high priority for me right now.

@KarsMulder

My two cents: I think that the hardest part of a keymapper project is actually not the implementation, but the design.

Yeah, I totally agree. One thing I'd like to explore here is which projects have (partially) compatible designs.

Since we can't beat python-evdev on versatiliy, we need to beat it on ease-of-use and user-friendliness.

Yep. But don't forget about performance. I don't like to have tools running in the background that use up more ressources than they need..

Do any of you have detailed documentation, describing your projects design?

snyball commented 1 year ago

If I were to create Hawck from scratch, this is the architecture I'd probably go with:

A single input-capture-redirect service with a small custom sandboxed VM, runs as a user with access to input, and accepts literally any piece of code given to it on a UNIX socket from the desktop user. Safe because the VM cannot interact with the outside world, which was rarely needed in practice anyway. And similar to @kermitfrog I'd want to write this new version in Rust rather than C++.

The system should have access to not just keyboard/mouse/controller input but also many xdg-desktop-portal extensions, preferably the portable ones, and should include some wm-specific functionality that doesn't exist portably for Wayland compositors right now (like currently focused window.) Also random number generator, tty-detection, open-in-browser, etc.

As for launching things, I think we could provide functionality for launching .desktop files, but only from /usr/share (and never using the users $PATH or really any of their env) without thwarting the Wayland security model.

Then any GUI-based thing can just talk to this input service, and it should be flexible enough to do whatever one of those GUIs might want to do, and any text-based system can be compiled to the VMs bytecode.

I've been thinking about building this service just for fun, but it has ended up on the back-burner for a while because low-level Linux input stuff can be kinda frustrating due to a lack of documentation in a few areas.

If anyone else thinks this is a good idea, I'd write a spec for this architecture for reuse in other projects.

Of course, 99.9% of users are looking for one of a few select specific things like replacing caps-lock with ctrl/escape, but I still think a highly generic but safe and fast keyboard remapping system is a nice-to-have for the platform.

jonasBoss commented 1 year ago

I started work on the InputRemapper beta branch a year ago in order to solve my personal needs (using a 3DConnexion SpaceMouse as Joystick). Which somehow escalated into reinventing the whole architecture. That pretty much confirms the concerns raised by @KarsMulder:

I think that the hardest part of a keymapper project is actually not the implementation, but the design.

That said, I think the current approach can accomplish almost any reasonable remapping (mouse/joystick -> keyboard and mouse <-> joystick) with support for combinations in each case. + macro support to generate complex input-sequences (I think it is possible to make keyboard-> joystick/mouse mappings with macros).

There are some limitations:

Inputs need to be processed as whole frames (EV_SYN - EV_SYN), not on a per event basis: https://github.com/sezanzeb/input-remapper/issues/457 this needs to be addressed, as there are multiple bug reports with this issue.
Cross device combinations are not possible. This is actually not true We do support combinations across multiple /dev/input/eventXX devices just not across different physical devices as they run in different processes. So this limitation is currently by choice not necessarily by design.
Macros have limitations https://github.com/sezanzeb/input-remapper/discussions/465#discussioncomment-3529920 @snyball suggestion for a sandboxed process could be a good solution to support proper scripting in macros.
Python is slow. InputRemapper has quiet some overhead. It was never properly profiled, so there might be potential to optimize it. But there is no arguing that Rust or C++ would be much faster.

In general I think it is quite possible to design a common Sevice which is simple to use for simple tasks e.g. remapping of n inputs to one output. But also provides a api for user scripts and more complex behavior. Implementing a good

dbus interface (change preset, update config, create event ...)

will make it possible to develop different GUIs or simple scripts which may or may not maintain their own configurations and translate them for the service.

sezanzeb commented 1 year ago

across different physical devices as they run in different processes

I sometimes wonder if this limitation can be avoided. Soon mappings will hold the information of their source device, so we could as well just record from all devices at once I guess. Idk.

Python is slow. InputRemapper has quiet some overhead. It was never properly profiled, so there might be potential to optimize it. But there is no arguing that Rust or C++ would be much faster.

For performance, if there really are no good optimizations possible, I'd not be very opposed of translating everything to a different language. It probably doesn't matter which one, because python is pretty much one of the slowest widespread languages. Translating the Tests could be a bit tricky sometimes, but they cover a lot of edge cases and past bug reports, so that would be really nice to be able to keep them. But anyway, if someone could do some profiling that would be great.

@snyball suggestion for a sandboxed process could be a good solution to support proper scripting in macros.

a small custom sandboxed VM

Also see https://github.com/sezanzeb/input-remapper/issues/500. I thought lua doesn't require a vm to sandbox it, or does it?

KarsMulder commented 1 year ago

Do any of you have detailed documentation, describing your projects design?

I do not have such documentation written other than the comments interspersed through the source code, but I can give a quick rundown of the major parts:

The input system The input system uses the C library libevdev to open and read events from devices. It uses the Linux epoll syscalls to wait until an event becomes available on any of the devices.

I have benchmarked epoll vs poll and was not able to find any measurable difference in performance.

I have not benchmarked how the performance would compare against using LIBEVDEV_READ_FLAG_BLOCKING. I wasn't even aware that was possible when I started writing, and at this point it would be too much hassle to implement it.

Argument parsing The command line arguments are parsed using the arguments module in a two steps: a parse() function which turns the textual commands into structs, and an implement() which turns those structs into variants of the enum StreamEntry and enumerates which input/output devices need to be opened. Those StreamEntrys are kind of like an internal bytecode that is used to process events. Many of the arguments map to a single StreamEntry, but not necessarily so. Some similar arguments like --map and --copy are both mapped to the same StreamEntry, and some arguments are mapped to a combination of multiple StreamEntrys.

Event propagation All StreamEntries are expected, if applicable, to define two methods similar to the following: apply_to_all(&[Event], &mut Vec<Event>) and apply_to_all_caps(&[Capability], &mut Vec<Capability>). The first function takes as input a vector of events, and should write what those events map to to the output vector. Events that the entry does not interact with should be written to the output vector as-is. An entry can drop events by not writing them to the output vector. Each entry is supposed to preserve the order of the events handed to it.

At a first glance, you may think that this use of out-pointers looks like a bad practice that originates from the time of C, and modern programs should just return Vec<Event> instead. However, I have found that the use of out pointers not only measurably increases performance, but surprisingly is also easier to work with. For example, if apply_to_all wants to delegate its task to another function (e.g. apply(Event, &mut Vec<Event>)) then it can just pass its out-pointer to that function and things will magically go well, which easier and more performant than having to do an .into_iter().flat_map(_) every time the task gets delegated. Furthermore, most of the reasons that out-pointers are a pain to work with in C are avoided in Rust since there are no buffer overflows, no buffer size limits, and &mut clearly marks which variables will be modified.

That said, in hindsight I think that processing multiple events at the same time was a bad design decision that is making some new arguments (most importantly, --hook --withhold) a pain to implement. I would redesign this internal model if that didn't break backwards compatibility in some edge cases.

The apply_to_all_caps function works similarly to the apply_to_all function, except instead of events, it works with capabilities: it takes a list of all events that might possibly reach this StreamEntry and should generate a list of all events that it could possibly generate on basis of that. This is how evsieve can automatically decide which capabilities its output devices should have.

The output system First the input devices get opened, their capabilities get read, their capabilities get propagated through the stream, and then output devices are created based on which capabilities come out at the end of the stream.

(Also, if an input device marked with --persist=reopen disconnects and reconnects and then turns out to have more capabilities, the capabilities are propagated again. If it turns out that some output device now needs more capabilities, that output device is destroyed and recreated with the new set of capabilities. I suppose nobody actually needs this, but I am picky about having evsieve work correctly in every single situation.)

Threading structure There is one main thread which does the event handling and uses epoll to wait for new events. New events are read from an input device, send through all StreamEntrys, written to the right output device, and then we wait for epoll() again.

If needed, some additional background threads may be spawned to do tasks that I do not want to delay event handling (i.e. garbage-collecting subprocesses that were spawed using --hook exec-shell= and monitoring the filesystem to see if previously disconnected event devices become available again in case of --input persist=reopen.) These threads communicate with the main thread using an internal UNIX pipe that is also monitored using epoll.

The code is written synchronously (i.e. without using the async feature), for two reasons: (1) in a previous development version that was based on python-evdev before I rewrote it in Rust, I found that using epoll to wait for events had half the latency of using Python's async, and (2) at the time, I heard that the Rust async ecosystem still had several rough edges. I have not benchmarked whether the Rust async feature has the same performance overhead as the Python one, but in the end I think it was the right decision to write synchronous code, because I cannot imagine the codebase becoming cleaner if async was involved.

kermitfrog commented 1 year ago

Hm.. I sense a wide agreement on Rust - no real suprise here :)

Maybe I should write a bit about inputmanglers current architecture (which isn't exactly how I would do it now):

The whole process runs in user-space and device access is managed through group permissions, which are set by udev. The important security concept is not to remap your main keyboard (which I never really needed to, so that's not a problem).
The configuration is split in 2 sections (actually 3, but I'm ommiting the mappings part as it only concerns how events are configured in the other sections).
- The first section defines which input events on which devices are available for remapping in the second section.
- The second section defines the generated events for those input events per window/windowtitle.
uinput devices are created to be able to generate all possible events that are configured in section 2.
For each input device there is a thread waiting for input. When input is read, it looks if it matches an event that was configured in section 1. If so, the corresponding event for the currently active window configuration (aka preset) is generated. Else the event is passed on.
There are different types of generated events (single event transformations, macros, wheel acceleration, ..), so each can be generated with a minimal code path.
- e.g: single events from keyboard to keyboard just modify the input code, then pass the event on.

Things I would like to change:

use a system service instead of group permissions.

Things I would like to keep:

the configuration logic. This does not have to restrict the backend, but would strongly influence how I'd design a GUI.
generate events with minimal code path.
no need to enter password for config changes. Not sure if this is possible without compromising security if the main keyboard can be read, though.

@snyball A VM sounds interesting for the purpose of writing complex macros. The question is: can it be done without compromising performance for simple use cases?

literally any piece of code given to it on a UNIX socket from the desktop user.

I assume you mean that the user process passes code to the service to execute on a given event, which is then done there. Not that events are passed from system space to user space, which then sends something back to system space. Right?

@jonasBoss

.. using a 3DConnexion SpaceMouse as Joystick ..

I sometimes wonder how many people use these things for gaming, compared to those who use them for their intended purpose of 3D-Modelling..

Inputs need to be processed as whole frames (EV_SYN - EV_SYN), not on a per event basis

Inputmangler has the same problem. Doing this per event has worked perfectly fine for me for a long time. But recently mouse wheels are sending normal wheel events alongside hi-resolution events, causing double scrolling events. I worked around this by surpressing the hi-res events, which only caused problems in QtCreator so far. But this definitively needs to be taken care of.

@KarsMulder Thanks for the detailed description. I think I'm going to have a closer look at your code when I have more time.

The input system uses the C library libevdev to open and read events from devices.

I wonder if libevdev causes any measurable overhead compared to direct ioctl calls / device read. This would be an interesting thing to profile.

pallaswept commented 10 months ago

Of course, 99.9% of users are looking for one of a few select specific things like replacing caps-lock with ctrl/escape

Since a lot of relevant people are here, this might be a good place to discuss this.

I'd agree with the above quoted assertion that 99.9% of users just want to do that one thing and be done with it, but that's because the some 25% of all users, who need more (there's a lot of us crippled dudes around), just can't use linux, so they don't. Back in windows-land, it's not even a bat of an eyelid to be running 5 or 6 input handling tools like this simultaneously. Nobody talks about it because it's normal. In linux-land, nobody talks about it because it's impossible. I mistakenly thought that problem has been solved, moved back to linux, and I've found out I was wrong. It's a physically painful mistake, but I'm too far in to go back to windows now, so I want to do what I can to get this sorted.

Since X doesn't support a lot of video features I need, I had been waiting for Wayland tools to mature, so that I could do all of the input mangling I need, which I could do in Windows. I kept my ear to the ground, and over time I heard about many new projects which were wayland-compatible replacements for existing X tools which I used to use in linux. xdotool gave birth to ydotool, some KWin shortcuts features offered keybinding ability to run scripts (that's AutoHotKey taken care of) and finally, the most important one for me, mouse-actions came along to replace easystroke(X)/StrokesPlus(Windows) for mouse gestures. So, I figured it was a safe time to jump ship back to Linux (I can't stand Windows, so this was exciting for me!)

I need to rebind and disable keys and key-combos, bind key combos to external commands, adjust analog input (joystick) sensitivity curves, re-map mouse buttons, map foot switches to scripts, and mouse gestures are an absolute MUST. Why:? Because I'm physically disabled. So all these accessibility tools aren't just "nice-to-haves', they're 'must haves'. And each of the presently available tools on linx/wayland works fantastically. But once I tried to use more than one, I hit a wall, and it's a hard one. While everyone was talking about how the lack of Wayland replacements for classic tools like xdotool had been solved, nobody was talking about the fact that you can't use them all. Only one.

Pretty much (actually I think it's literally) every Wayland input device handler, takes the same approach - go a layer lower in the stack than X11 did, and exclusively grab the evdev devices. It's a simple solution t the problem but short sighted in that it means you get to choose one and only one accessibility tool, because one effectively locks out all the others. It doesn't seem practical or realistic that any single tool should be the all-singing all-dancing solution to every input device accessibility requirement, so the thing that is really needed from all of you tagged in this thread, is to find a way to get your tools to play nicely together.

I'm not really sure of the right way to go about resolving this issue, but I am sure that it means that, at least in it's present form, Wayland is an accessibility failure from the get-go. And I see a lot of people who should be involved in a conversation about this, in this one thread, so I'd be interested to hear your thoughts. Because if you're going to discuss collaboration, this is the first thing that needs to be addressed. None of you could be expected to write a single tool that does everything, nor should the user be limited to that one tool, so finding a way to make them all work simultaneously is step 1 in collaborating (I mean, the word 'collaborating' literally means 'working together' and most of your apps won't work together 😄 )

Since it's been almost a year, I'll do that ping again, apologies if this causes you any consternation: @sezanzeb @samvel1024 @shiro @rvaiya @snyball @KarsMulder @jersou

Speaking in terms of the solution to this problem....it strikes me that what's required here, is a new layer between evdev and these applications, which would exclusively grab the evdev device as these apps do, and then allow these 'client' applications, rather than exclusively locking the devices, to subscribe to callbacks from the intermediary layer, to handle input events; thus allowing a single input event to be handled by multiple applications. Perhaps there's a better way to deal with it, which is why I'm asking you for your thoughts.

sezanzeb commented 10 months ago

Pretty much (actually I think it's literally) every Wayland input device handler, takes the same approach - go a layer lower in the stack than X11 did, and exclusively grab the evdev devices

a new layer between evdev and these applications

I like this idea

This avoids grabbing, while still allowing applications to hide events from the desktop-environment. This way, multiple mapping-tools can map the same event to whatever they like.

Those new pipes that applications read from could be compatible with tools like python-evdev by behaving exactly like uinputs/InputDevices, they are just at a different path, and they ignore requests for grabbing. Allowing existing mapping tools to continue to work, as long as they discover those new devnodes.

The new-layer has to wait for each mapping-tool to report that it is done handling the event, and only then decide if the event should be forwarded or not. It won't forward it, if one of the tools reported that it is suppressed.

If a service/layer like this is written, then please

dependency injection architecture
well commented and documented code
unittests
low-level-language (probably Rust I guess, beware though that I can't write code in any low-level-language, and I'm not a very active programmer in my freetime anymore)

shiro commented 10 months ago

Would like to see someone make a proof of concept for this to test performance, lots of piping/polling going on, not sure how much latency this adds.

Maybe a wayland protocol would be a good place to put this, not sure if gnome/kde would pick it up though.

sezanzeb commented 10 months ago

Given that noone ever comlained about input-remapper having too much of a latency, even though it's written in python and never has seen any sort of optimization, I doubt it will be significant. But that is just my gut-feeling.

sezanzeb commented 10 months ago

I had to add to the proposal above, that the new-layer has to keep track of tools that are reading, in order to wait for each one of them to finish processing in order to know if the event is suppressed. I don't know if this is possible. Do owners of pipes have a way of knowing which processes are reading from it?

KarsMulder commented 10 months ago

I had to add to the proposal above, that the new-layer has to keep track of tools that are reading, in order to wait for each one of them to finish processing in order to know if the event is suppressed. I don't know if this is possible. Do owners of pipes have a way of knowing which processes are reading from it?

I think it is not possible to accomplish the above with just pipes because anything you write to a pipe can only be read by a single process anyway. Those "new readable pipes" would have to become Unix domain sockets instead. With sockets, it becomes also becomes possible to track which processes are listening as a nice side-effect.

Would like to see someone make a proof of concept for this to test performance, lots of piping/polling going on, not sure how much latency this adds.

I haven't tried implementing the proposed scheme, based on how quickly I've managed to get event round-trips to work in evsieve, I'd expect a latency of ~0.15ms when zero input remappers are in use (one round-trip; I assume that most of that latency comes from waiting for the scheduler to give the program some CPU time), and at least ~0.45ms when one or more input remappers are in use (which involves three round trips.) An inefficient Python implementation takes ~0.5ms for a single round-trip.

Assuming you're gaming on a cutting-edge 240 Hz monitor, a latency of 0.45ms would mean that there is about 11% chance that an input event gets delayed a single frame. Which is an acceptable delay in case you're actually using remappers.

For users not using remappers, I can however imagine that any scheme that proposes adding 0.15ms of latency to Wayland as a background service would receive more flak than dbus. Some people still don't accept that dbus adds enough value to be worth the couple of megabytes of memory it uses. If we want to go with the above scheme, I think it would greatly help adoption if it was a dynamic service that could be started on-demand when the first program needs it, rather than something the operating system is expected to keep alive whether the user wants it or not.

The protocol Note that it is not possible to just play out the evdev protocol as-is over a pipe because event devices accept more ioctls than just read(). For example, it is possible to query the capabilities of a device, query the current position of a joystick without having seen any event for them, and of course grab the devices. We would need to find a new protocol that either works fundamentally different from evdev, or encodes all actions that are possible in evdev over some bidirectional communication protocol.

Of course, libevdev (and python-evdev?) would need patches to be able to work on those sockets.

I personally think that the evdev protocol is a bit painful to work with. However, we must remember that the evdev protocol has been crafted by kernel developers which have seen every single crazy input device hardware manufacturers have devised, and the evdev protocol has stood the test of time for quite a while now.

I would be skeptical about proposals to replace the input stack that the kernel has built up with a new protocol in a userspace daemon just to make keymapping possible.

Alternative solution: can't we solve this in the kernel instead?

It is easy to jump to the idea of writing another userspace daemon because you can "just do" and does not require anyone's approval, but I wonder if our effort is better spent submitting patches to the kernel instead?

So far, a lot of event mappers for Wayland have decided that grabbing event devices and creating new event devices is a good idea. However, we're discussing creating an abstraction layer over them. This makes sense because there are several drawbacks to the approach of creating new event devices. From the top of my head, the big pain points are:

It takes a while for the new device to be recognized by programs, which is painful for short-lived scripts that just want to send some keys and then quit.
The new device does not take over the state of the old device. If the user has pressed a key on the keyboard and grab that device, that key will remain permanently pressed and cannot be released by a KEY_UP event on the virtual device.
Any configuration options the user has applied to the old device are not taken over by the new device. For example, if the used changed the mouse acceleration of their physical mouse, they need to re-configure that acceleration for the virtual mouse.
Event devices need to announce all event codes they can produce when they are created. When user keymapping scripts can be Turing-complete, it becomes impossible to reliably predict what those codes can be.
Event devices are valid for the whole system rather than a single user, and therefore require root-level permissions.

The kernel folks have already been kind to the keymapping community by giving us tools like uinput and grabbing event devices. And looking at the above list, I think all except the last pain point could be fixed if we had an additional ioctl (say, EVIOCSHADOW) which did the following:

Create a shadow device identical to an opened event device (say, shadow-1 and keyboard-1 respectively.)
Redirect all events originating from the kernel that would be sent to keyboard-1 to shadow-1;
Give a file descriptor of shadow-1 to the program that issued EVIOCSHADOW, and not to any other part of the system;
Allow the program that issued EVIOCSHADOW to treat keyboard-1 as if it were an uinput device;

In other words, a kernel ioctl that makes it possible for a program to change the events on an event device without the rest of the system having to notice that event devices are getting created, grabbed, or destroyed.

It would solve issue 1, 2, and 3. Issue 4 would remain; solving it would require some kind of extension to the evdev protocol to allow devices to change their capabilities, but that might run into backwards compatibility issues. Issue #5 would remain as well, but is more of a theoretical issue since most computers are single-user nowadays.

This way it also would make it easily possible to run 5 or 6 input handling tools simultaneously, since each tool can shadow the input device that was already shadowed by the previous tool without needing those tools to even be aware that there are other tools running as well.

Thinking about it, most of the pain points related to grabbing event devices for keymapping stem from the newly created uinput device being a distinct entity from the original device. If we could get a new ioctl that would allow us to sidestep that issue, about 60% of our problems would be solved without requiring a new userspace daemon.

kermitfrog commented 10 months ago

[design-diagram]

I'm also worried that the whole extra layer might add too much latency and complexity. If we do that, it should ideally be optional in the sense that it's only used when there are actually multiple tools trying to grab the same device.

My gut feeling says that the kernel approach is probably the better idea, but I have to think about that some more...

As for issue number 4:

Event devices need to announce all event codes they can produce when they are created. When user keymapping scripts can be Turing-complete, it becomes impossible to reliably predict what those codes can be.

I don't think it's such a big problem. In the first version of inputMangler, I didn't know that uinput could do everything I needed, so I wrote my own kernel-module which simply announced all the events that could make sense for that type of device - no matter whether those events were ever generated or not. As far as I remember, the only real issue was that the capabilities determine which kind of device the system believes it to be. If you do it wrong, a virtual joystick might be recognised as a tablet. But it's not too hard to figure out.

SDL (and by extension Steam) seems to differentiate between joystick and controller by looking up the vendor/product id in a database first, then defaulting to controller if the device has 6 axes. So it might be good to convince the SDL devs to reserve certain ranges of product ids for vendor 0x0000 for certain types of virtual devices to prevent issues (I had enough of these with the Spacemouse).

Of course, if we were to actually make one backend service to handle all possible input transformations, which has great performance, and so on, all of that might not even be necessary... well .. if it just was that easy..

Until any of this is implemented, maybe there is a workaround... the question is:

@pallaswept: do you need to have the same events processed be multiple tools that grab a device?

If e.g. you just need tool A to process mouse movements and tool B to process it's buttons, this might be solvable by splitting the events into 2 virtual devices. I think evsieve is currently the only tool that supports this, so that would be the lowest layer. Then tool A could grab the move device and tool B the button device.

There might be some problems with tools reading the virtual devices if all of them have the same vendor/product-id, but uinput allows that to be changed. @KarsMulder does evsieve support setting those ids?

It might also be neccessary to unite those devices later, but I believe most of the tools here do that anyway.

sezanzeb commented 10 months ago

I think it is not possible to accomplish the above with just pipes because anything you write to a pipe can only be read by a single process anyway. Those "new readable pipes" would have to become Unix domain sockets instead. With sockets, it becomes also becomes possible to track which processes are listening as a nice side-effect.

Thanks for the clarification

As far as I remember, the only real issue was that the capabilities determine which kind of device the system believes it to be. If you do it wrong, a virtual joystick might be recognised as a tablet. But it's not too hard to figure out.

Couldn't get it to work for a stylus, but yeah, it can be figured out somehow usually. I wish it was determined by some sort of enum value instead that is being reported by a device.

Issue 4 would remain; solving it would require some kind of extension to the evdev protocol to allow devices to change their capabilities

If events contain that enum, the system could decide to treat is as joystick movement, while ignoring any device capabilities, couldn't it?

This way it also would make it easily possible to run 5 or 6 input handling tools simultaneously, since each tool can shadow the input device that was already shadowed by the previous tool without needing those tools to even be aware that there are other tools running as well.

@KarsMulder something like this? When the hardware reports "a", each shadowed device receives the event for "a", and each tool reads "a". What if each tool decides to not map this key and just forward it, will "aa" be written to "Keyboard"?

KarsMulder commented 10 months ago

Something like this. Imagine that the default setup is like this:

nomap

(I suppose this is slightly oversimplified since the read() system call that Wayland makes needs to pass through the kernel as well, but anyway.)

A physical keyboard emits events to the kernel. The kernel sends those events to an event device keyboard-1. Wayland and other processes on the system can read those events.

Now suppose a program "Mapper #1" comes along which issues the hypothetical EVIOCSHADOW on keyboard-1. The kernel will then adjust the topology to become like the following:

onemap

The kernel stops writing the events from the physical keyboard to keyboard-1. Instead, it writes them to shadow-1, an event device that is only accessible to Mapper #1 and no other part of the system. Mapper #1 get a file descriptor for shadow-1, but shadow-1 does not show up in /dev/input or anywhere else. The state of shadow-1 is identical to the state of keyboard-1 at the time that EVIOCSHADOW was issued, e.g. any keys that were pressed on keyboard-1 are also considered to be pressed on shadow-1.

The program Mapper #1 can now read events from shadow-1 like it can read them from any other event device. If Mapper #1 does nothing, then no events get written to keyboard-1 and the whole system loses access to the keyboard just as if it had been grabbed. Mapper #1 can write events to keyboard-1 like it can write events to any other uinput device. The events it writes to keyboard-1 can be read by Wayland and so on.

The mapper scripts do not explicitly announce that they want to drop any particular event, events can simply be dropped as consequence of a mapper script reading an event from a shadow device and then not writing that event to its output device.

This is basically the trick of "grab an input device and create another uinput device", except this whole process is invisible to Wayland. Wayland can just keep reading events from keyboard-1 as if nothing happened, whereas with the old method Wayland would have to notice that another input device was created and open it, without even knowing that this new input device was related to another device.

When another script, say Mapper #2 also issues EVIOCSHADOW on keyboard-1, the event chain becomes:

twomap

Just like the events from the keyboard got redirected to shadow-1 when Mapper #1 issued EVIOCSHADOW, a second invocation of EVIOCSHADOW causes the events that Mapper #1 writes to be redirected to shadow-2. This means that all events from the physical keyboard first pass through Mapper #1, then through Mapper #2, and finally back to Wayland and the rest of the system.

KarsMulder commented 10 months ago

There might be some problems with tools reading the virtual devices if all of them have the same vendor/product-id, but uinput allows that to be changed. @KarsMulder does evsieve support setting those ids?

It currently doesn't because I actually wasn't even aware that event devices hat vendor and product ids. I thought that was something that only existed at the USB-device level, but I guess I was wrong.

It doesn't seem like a difficult feature to add. I'll get around to it when I figure out what the CLI arguments should be.

(Should --output accept a clause like device-id=045e:082c or should that require two clauses like vendor-id=045e product-id=082c? The latter seems unnecessarily verbose, but the former gives the impression that vid:pid are the only two things that matter for a device ID and forgets about the bus number and version number. I suppose I'm going to need clauses like bus=3 and version=111 too, unless there is some standard format to make them fit in a single clause like device-id=3:045e:082c:111. Also, should bus and version number be specified in decimal or hexadecimal format? Usually you think about those things as decimal, but evtest reports them as hexadecimal and the vid:pid are hexadecimal too...)

kermitfrog commented 10 months ago

Couldn't get it to work for a stylus, but yeah, it can be figured out somehow usually. I wish it was determined by some sort of enum value instead that is being reported by a device. [..] If events contain that enum, the system could decide to treat is as joystick movement, while ignoring any device capabilities, couldn't it?

An enum for the device type would be nice :) But I wouldn't send it with every event. If I had to handle a specific device in my end-user-application/GUI-library, I'd want my handler to be able to rely on all events belonging to one device type and use different handler for a different device class. An enum per event would just add bloat at multiple levels. Mixing such stuff is what we do ;)

Hm.. that makes me wonder if we could speed up input events on linux by truncating the timestamp to the final 16 bits. I'm not sure the rest is really needed anyway.

[..] When another script, say Mapper # 2 also issues EVIOCSHADOW on keyboard-1, the event chain becomes: [..]

This would also reduce the systems number of virtual devices. We would still need them for events that don't fit into existing ones..

Some things to decide on:

does Mapper # 2 see the original events, or those sent by Mapper # 1?
- if the latter: lets asume Mapper # 1 translates a keyboard event to a mouse event.. does that affect Mapper # 2?
- do we inject the new event into a shadowed mouse if capabilities match or always into a virtual one?

I'll get around to it when I figure out what the CLI arguments should be.

I'd use device-id=045e:082c with a possible shorthand device-id=:082c when vendor-id is 0000, as both are always needed to identify a device. Hex seems the better choice -- all tools seem to report them in hex and deviating from that would just make it harder for the user.

KarsMulder commented 10 months ago

does Mapper # 2 see the original events, or those sent by Mapper # 1?

Those sent by Mapper #1. The effect is the same as if Mapper #1 created a virtual device shadow-2 which was subsequently grabbed by Mapper #2.

the latter: lets asume Mapper # 1 translates a keyboard event to a mouse event.. does that affect Mapper # 2?

The shadow-* devices all must have the same capabilities as the original keyboard-1 device. Assuming that keyboard-1 didn't just happen to have an integrated mouse, it is not possible for Mapper #1 to write mouse events to keyboard-1.

When Mapper #2 starts and silently replaces keyboard-1 with shadow-2, this transition is supposed to be invisible to both Mapper #1 and to Wayland. As such, Mapper #1 can still not write mouse events to keyboard-1/shadow-2.

It would be possible for Mapper #1 to shadow another mouse device (or create a new virtual one) and write mouse events to that device.

Either way, Mapper #2 will not be able to observe any mouse events getting emitted by the keyboard device. If Mapper #2 does want to observe mouse events, it should listen to or shadow a mouse device as well.

do we inject the new event into a shadowed mouse if capabilities match or always into a virtual one?

I imagine that writing events to keyboard-1/shadow-2/whatever would follow the same rules as writing events to any other virtual device: events that do not match the capabilities of the virtual device get silently dropped. It is the job of the mapper script to ensure that it is writes its events to devices that are capable of them.

pallaswept commented 10 months ago

This thread is pure gold so far and I want to thank you all sincerely for your input. I hoped but never imagined I'd have such a positive response, thanks so much.

@pallaswept: do you need to have the same events processed be multiple tools that grab a device?

Yes it's pretty frequent. Just to make matters worse, it's also fairly common to need to process the same event (say, pressing the ctrl key) by multiple tools, from multiple devices. Like say, maybe one day I can't use my left hand, so I'll rebind a mouse button to a ctrl key, and I'll need the footswitch to read the ctrl keypress regardless of where it came from, from the keyboard, some other device (bluetooth keyboard), the re-bound mouse button, on-sreen keboard, etc, to modify the footswitch's behaviour, or I might use that same ctrl key to modify the behaviour of some other keybind, in another tool. Just to give a curly example.

I'd echo everything that's been said ITT so far. A middle layer is the least reliant on outside support, but it does have shortcomings. I also feel like the kernel is the best place to be doing this, from a functional point of view. A Wayland protocol might also be as functional, but then there's a reliance on its implementation from every compositor, and it might take a very long time to become a reality, or just never happen. I do have similar fears about doing this in the kernel, though. I wonder how hard it would be to get the kernel maintainers interested in such a thing, enough that it could become a reality. Requiring a custom patched kernel would make it somewhat prohibitive. That being said, if it could be a kernel module, then that makes things a lot simpler for the end-user to implement.

And yeh, thanks again for this amazing conversation. Your input is priceless. Please forgive my lack of input, I'm mostly just trying to stay out of the way right now :)

kermitfrog commented 10 months ago

little time today, so I'll be brief:

does Mapper # 2 see the original events, or those sent by Mapper # 1? Those sent by Mapper #1. The effect is the same as if Mapper #1 created a virtual device shadow-2 which was subsequently grabbed by Mapper #2.

Ok, EVIOCSHADOW sounds good so far. One more thing: having several tools start up in parallel and some of them release and reshadow devices on configuration change will make the order of event processing random. This can be solved by adding a priority parameter to the ioctl call. Tools need to make this configureable by the user.

[..] I do have similar fears about doing this in the kernel, though. I wonder how hard it would be to get the kernel maintainers interested in such a thing, enough that it could become a reality. Requiring a custom patched kernel would make it somewhat prohibitive. That being said, if it could be a kernel module, then that makes things a lot simpler for the end-user to implement.

I don't think we can just make this a new kernel module and load it. It would have to extend evdev, which is part of the kernel. Maybe evdev can be compiled as a module and then replaced by a patched version - but both mean compiling your own kernel :/.

So the next step (if noone has anything to add) would be talking to the evdev developers.

kermitfrog commented 10 months ago

An enum for the device type would be nice :) On second thought this won't work :( . The possible sources for that information are:

deducting from capabilities

database lookup

manufacturer provided

1 & 2 are better left to userspace. 3 would require a new USB-specification and can not be trusted to be reliably anyway due to manufacturers sometimes having different opinions on what a device is than users (see Spacemouse)

pallaswept commented 10 months ago

Maybe evdev can be compiled as a module and then replaced by a patched version

I might be wrong about this but I think it should be possible to block the init call to the built-in evdev module, and load a dynamically loaded module over the top of it.... But still, it would be much nicer to just get this mainlined.

So the next step (if noone has anything to add) would be talking to the evdev developers.

I kinda like to think that they'll understand the need for this, and see there's a lot of good thought that's gone into it here, so hopefully they'll hear the idea and be fully on board and maybe even have some ideas to contribute as well (it is certainly their wheelhouse!). If they deny it from the outset I guess we'll deal with that when it comes to it, but I'll try to remain optimistic. I don't suppose any of you have a line to someone appropriate to start such a discussion? Or is this the kind of thing that should go direct to the mailing list in the wild?

KarsMulder commented 10 months ago

One more thing: having several tools start up in parallel and some of them release and reshadow devices on configuration change will make the order of event processing random. This can be solved by adding a priority parameter to the ioctl call. Tools need to make this configureable by the user.

This is a good point.

If we want to uphold the tenet that it should not matter to the rest of the system whether a device has been shadowed or not, then the only sane position to place a new mapper script is at the end of the chain, because that is the only position where the shadow device has exactly the same state as the opened keyboard-1 had. If it were inserted at another position in the chain, it might suddenly observe that, without any event getting sent inbetween, the caps-lock key has been pressed and the ctrl key has been released.

However, if that approach is taken, then the final result of the mapper chain will depend on the order in which the scripts were started. And if that order is nondeterministic, we're in for a load of chaos.

It would be possible to use an external service like systemd to determine the order in which the scripts start. However, that has disadvantages:

Using systemd to order the startup of mapper scripts is not user friendly and hence not the solution we should be working towards;
Using a "mapper script startup manager" to order the startup may require the collaboration of the scripts and adds yet another moving part to the operating system;
Whichever of the above approaches is taken, the whole chain of scripts would have to be restarted when the user wants to insert "Mapper #3" between "Mapper #1" and "Mapper #2", which runs counter to the goal of minimizing the disruption that adding additional mapper scripts causes.

As such, I agree that it is important that we can get support for that priority parameter on kernel level.

Intermezzo: some observations

Before I talk some more, I'd like to point out some observations I made of how the event devices currently work.

Writing events to physical devices can already be done Although the documentation suggests that you can only write events to uinput devices (e.g. the corresponding libevdev function is called libevdev_uinput_write_event, it is actually possible to write events to physical event devices:

#!/usr/bin/env python3

import evdev
import evdev.ecodes as e

device = evdev.InputDevice("/dev/input/by-id/keyboard")
device.write(e.EV_KEY, e.KEY_A, 1)
device.write(e.EV_KEY, e.KEY_A, 0)
device.write(e.EV_SYN, 0, 0)

If you run this script, you will see that an "a" key gets written to your terminal.

This means that if some short-lived script just wanted to inject some events, it would be possible to do so by opening an existing device, writing events to it, and then closing it. The trouble lies more in figuring out how to block events (like if you want to map caps lock to ctrl, you need to block caps lock somehow) than in how to generate events.

If you write an event to a device, it will be broadcast to all readers including yourself

If you append the following lines to the end of the above script, you will see that in addition to printing "a" to the terminal, this script will also observe its own KEY_A event:

for event in device.read():
    print(event)

Only one program can grab a device at once

Evtest can tell you whether a device has been grabbed already. The function libevdev_grab will return a negative status code if you try to grab a device that is already grabbed by something else.

You can write events to a physical device that has been grabbed by another program

Tested by running evsieve --input /dev/input/by-id/keyboard grab --print and then writing to it using the above python script. Just make sure you don't accidentally lock yourself out of using your keyboard if you only have one keyboard.

A new perspective: how about filtering devices instead of shadowing them?

So, back to the main point. Keeping in mind that (1) we can already write events to real event devices, and (2) the real trouble is hiding events rather than writing them, I've thought of a new model of the solution that is possibly easier to wrap your mind around. I propose we split up each event device into layers:

withlayers

For simplicity, I've drawn eight layers, but we're probably going to want more of them. If I had to give a number, I'd say 65536 layers ought to be enough for anyone.

(Of course those layers should be stored like a sparse vector. I am not suggesting that the kernel keeps an array of length 65536 in memory. It should just remember a vector of pairs (layer number, progam that has this layer open) that is sorted by layer number.

For each layer, it is possible to:

Read all events that arrive or are written to that layer;
Write events to that layer;
Grab that specific layer.

The kernel writes all events from physical devices to layer 0 of the event device. Each event automatically traverses each layer until it encounters a layer that has been grabbed by something. So, if none of the layers are grabbed, any event written to layer 0 will be read by all programs that are listening at any layer.

Furthermore, as long as no layer is grabbed, it would not matter whether a program was listening to layer 7 or layer 5:

withlayers_separate

The behaviour of this setup would be identical to the first image. There is no ordering guarantee that Xorg will see the events before Wayland does. Any event from the keyboard shows up simultaneously on both Xorg's and Wayland's view of the event device, and whoever reads that event first depends on the whims of the CPU scheduler.

(Also, the above is theoretical. A well-behaved program should not listen to any layer but the last one unless it has a good reason to.)

The layer numbers become interesting when one program grabs a layer. If a layer is grabbed (say, layer 4) and an event is written to a lower layer, then that event will only traverse all listeners up to the layer that is grabbed. For example, a Mapper script could decide to grab layer 3 to make sure that no event emitted by the physical keyboard traverses further than layer 3. It can then write its events to layer 4 so that all programs that are listening to layer 4 or later can read them:

withlayers_grabbed

Another mapper script (say, Mapper #2) could then grab layer 2 if it wants to process events before the already-running mapper script, or could grab layer 4 if it wants to process events after the already-running script.

From userspace perspective, the API works like this:

Whenever you open() an event device, you automatically open it at the last layer available;
You can issue an ioctl like EVIOCSWITCHLAYER to switch to another layer.

And for avoidance of doubt, here is how the edge cases are resolved. All of the following is backwards compatible with the current behaviour of event devices:

Only a single program can grab a particular layer at any time;
A program listening to a particular layer will not be able to see any events if another program has grabbed that layer;
It is possible to write events to a layer that has been grabbed by another program;
If a layer has not been grabbed, all events written to it can be read by all programs listening to the same layer.

I think that this new model makes it easier to think about how events propagate because they do not involve any anonymous shadow devices which need to be kept track of and possibly get re-linked when mappers start or exit. I do not know whether it would be easier to implement in the kernel though.

sezanzeb commented 10 months ago

It is possible to write events to a layer that has been grabbed by another program;

For each layer, it is possible to: Write events to that layer;

Are you suggesting to allow writing to an arbitrary layer? So when writing an event, the mapping-script has to take care to write to the correct layer?

Or will mapper scripts write back to the keyboard-event-device, and the event automatically goes into the next layer? In that case, how would the kernel distinguish between events written from different layers in order to know what the next layer is?

KarsMulder commented 10 months ago

I suggest allowing scripts to write to arbitrary layers to keep the API consistent. That said, any well-behaved mapping script should only ever write to the layer directly after the one they grabbed.

If the script were to write to the same layer that they grabbed, then the only the script itself would be able to read the events it just wrote, which accomplishes nothing. Writing to a layer lower than the layer they grabbed will just result in those events propagating back to the script sooner or later. Writing to a layer that is more than one later risks inadvertently skipping over other mapper scripts.

The standard approach for introducing a mapper script would be:

Open an event device twice to get two file descriptors (or open it once and then duplicate it, if we can get the kernel to not share the active layer between duplicated descriptors. It would avoid a race condition, but I'm not sure if duplicated file descriptors conventionally diverge like that);
Invoke EVIOCSWITCHLAYER to switch the first descriptor to layer n and grab it;
Invoke EVIOCSWITCHLAYER to switch the second descriptor to layer n+1;
Read events from the first file descriptor and write events to the second file descriptor.

I suppose the other way to implement this API would be "a file descriptor on layer n reads events from layer n and writes them to layer n+1, unless n is the last layer, in which case it writes events to layer n", which has potentially less chance of developers messing up the above convention, but at the same time feels really inconsistent. That would mean that a script could read back the events it writes to the device if and only if the last layer is active. Such exceptions in APIs are the bane of correct programs.

(Edit: then again, the last layer is special anyhow because you cannot just open layer n+1 when n is the last layer. And race conditions are the bane of correct programs as well. Maybe this requires some more thought...)

KarsMulder commented 10 months ago

I don't suppose any of you have a line to someone appropriate to start such a discussion? Or is this the kind of thing that should go direct to the mailing list in the wild?

The kernel development guide encourages you to contact the kernel community early about your planned changes before you actually start working on implementing those changes. I suppose that once we've reached some degree of consensus over here, we should ask the kernel community what they think of our approach.

In addition to the main kernel mailing list, the kernel also has various smaller mailing lists for more specific topics. I think that the linux-input mailing list would be the right place to post these ideas.

pallaswept commented 10 months ago

While thinking this through, I considered this quite a bit with my hypothetical mappings, and after much thought, I feel that in order to have a deterministic output, it's going to be a prerequisite to start, or at least process, the manipulation applications, in a deterministic order.

It would be possible to use an external service like systemd to determine the order in which the scripts start. However, that has disadvantages:

Using systemd to order the startup of mapper scripts is not user friendly and hence not the solution we should be working towards;

I also felt like it's a task well suited to systemd, and honestly I think it's not that prohibitive. The Before and After directives in unit files are pretty straightforward, not too difficult to implement and fairly intuitive, and

the whole chain of scripts would have to be restarted when the user wants to insert "Mapper #3" between "Mapper #1" and "Mapper #2", which runs counter to the goal of minimizing the disruption that adding additional mapper scripts causes.

That shouldn't be a problem. Say 'mapper 2' is set to run After 'mapper 1', a newly added 'mapper 3' set to run Before 'mapper 2' and After 'Mapper 1', or even changing 'mapper 2' to run After 'mapper 3', provided 'Mapper 3' is also set to run After 'Mapper 1', shouldn't interfere with the running of 'mapper 1' and '2' at all. SystemD does a pretty good job of sorting out all these priorities, even on the fly.

Something which I'm sure is of concern to you guys in particular, is the difficulty in migrating existing tools to this new implementation. The ability to write directly to real devices certainly simplifies things on that front. Dodging the need to add code to handle a whole new type of input device would be a major plus. Leveraging systemd to determine the order of event processing would certainly be easier from a migration perspective than per-mapper priorities. and given that

any well-behaved mapping script should only ever write to the layer directly after the one they grabbed. If the script were to write to the same layer that they grabbed, then the only the script itself would be able to read the events it just wrote, which accomplishes nothing. Writing to a layer lower than the layer they grabbed will just result in those events propagating back to the script sooner or later. Writing to a layer that is more than one later risks inadvertently skipping over other mapper scripts.

It seems that the design is sufficiently firm in its structure (always grab layer n and output to n+1), that simply ordering the start of the 'Mappers' would get the job done.

That doesn't prohibit

allowing scripts to write to arbitrary layers to keep the API consistent.

And allowing it to be flexible shouldn't; hurt anything - provided that its never misused/abused, and generally the client would stick with 'n -> n+1'.

And while I feel like systemd could take care of this, it's obviously something that would be done equally well by any priority manager, and if that's a kernel-based property given to or dictated by the mapper, during the grab, it would work at least as well. So, I'm not firmly insistent upon either approach, But I do get the feeling that systemd might just be enough.

Regarding the last layer:

the last layer is special anyhow

I agree with all you said about this. I'd say the solution is pretty simple - don't allow the last layer to be grabbed. Consider it 'output-only'. I'd say it's unlikely with 64K of them that it would ever even be requested, but at the end of the day, something has to be the end of this chain, and preventing grabbing that last layer (ie just returning an error when requesting the IOCTL) enforces that as a simple and strict reality - the buck stops there.

KarsMulder commented 10 months ago

I also felt like it's a task well suited to systemd, and honestly I think it's not that prohibitive. The Before and After directives in unit files are pretty straightforward, not too difficult to implement and fairly intuitive, and

I mean, if you were so inclined, you could already chain all the mapper scripts you want as long as you can:

Tell every mapper script exactly which input and output devices it should use;
Start the scripts in the right order to make sure that the next script starts after the output devices of the previous script are available.

The problem is that it is a major headache to configure all of this, and even after you set it up, the whole structure of scripts is linked to each other like a house of cards. If one script exits, restarts, or a new script were added to the mix, the whole stack of cards would have to be restarted.

As you said yourself:

Back in windows-land, it's not even a bat of an eyelid to be running 5 or 6 input handling tools like this simultaneously. Nobody talks about it because it's normal.

I know that it is ironic that I'm saying this since my program evsieve is clearly a tool that is meant for power users, but I think that it is ridiculous that a Linux user should be expected to jump through the hoops of having to write systemd unit files just to make something work that works pretty much automatically on Windows. Not all disabled people are programmers.

That said, I think you are on to something. Looking back at the list of issues I mentioned, issue 1 (trouble for short-lived scripts) could be solved already if it is possible to write events to physical devices.

Issue number 2 (virtual devices do not take over state) is maybe not that bad if all virtual devices are long-lived, i.e. they only have to be reset if the user decides to change their keymapping configuration. In those cases it wouldn't be that bad if the whole house of cards collapsed and had to be restarted.

Issue number 3, well... if we could just convince the Wayland devs that they should ignore the existence of all grabbed devices and carry over their configuration to virtual devices with the same device id, we might come far?

Anyway, whether we're going with the shadow approach or the layer approach or the userspace daemon approach, existing mapper scripts would have to be updated to take advantage of it. Without updates, they will just continue grabbing devices and creating new devices.

With the layer approach, we're basically asking the kernel to orchestrate how events should flow from mapper script to mapper script. But maybe if we created:

A standard configuration format and location that tells you in which order event mapping scripts should start, which input devices they should take, which output devices they should generate;
A CLI/GUI tool that allows the user to configure in which order they want to start their tools/scripts, and then automatically generates the configuration files for (1) along with the systemd configuration files, and can restart the whole house of cards when the configuration changes;
A Python/Rust/C library which can read the format of (1) and automatically open the right input devices and create the right output devices.

Then we may be able to make all scripts that use this library work together as seamlessly as is possible. We don't really need a daemon to relay events between programs, just a standard on configuration files so that all collaborative tools know where to get their input and where to put their output.

The CLI/GUI tool should let you configure the following:

I want to use this program with that configuration file;
I want to feed it these input devices.

The library should help the scripts announce which types of input devices they expect to see and which types of output they intend to generate for a given configuration file.

And then you can draw lines between how the output devices of some script should connect to input devices of other scripts. With some quality of life features like automatically drawing the lines if the connection seems to be obvious, which it usually is. (Like you start a script mapping a keyboard to a keyboard+mouse, and the next script you enter expects a mouse.)

The library may look a bit like this in pseudo-Python-code that I haven't given much thought to yet:

import evdev
import evmap  # <-- our new library, name provisory
import asyncio

def parse_configuration_file(path):
    # Your code here.
    pass

configuration = parse_configuration_file(evmap.get_configuration_file())

# Tell evmap how many input and output devices you expect based on the configuration
# file you just parsed. Let's say that this program always expects one input keyboard
# and one output keyboard regardless of the configuration. Also give that device a
# friendly name for when the user wants to configure this script.
keyboard_in = evmap.declare_input_device(
    evmap.DeviceType.KEYBOARD, name="Primary keyboard"
)

# Tell evmap the expected output devices and their capabilities.
keyboard_out = evmap.declare_output_device(
    evmap.DeviceType.KEYBOARD, evmap.capabilities_of(keyboard_in), name="Fancy keyboard"
)

# Tell evmap that you're done configuring the input and output devices.
# If this is not a dry run, it will verify whether the input/output devices
# you just declared match the system configuration for this script.
#
# If this is not a dry run and no system configuration has been defined,
# takes some arbitrary input devices? (Maybe those arbitrary devices
# should've been taken earlier already to get the capabilities of them?)
evmap.finalize_configuration()

# The configuration tool may make dry-runs of your program to see what kind of input
# and output devices would be expected for a particular configuration file, so the
# user does not have to tell that to the configuration tool.
if evmap.is_dry_run():
    exit(0)

# Calling init() automatically opens and grabs all input devices it is meant to grab,
# creates the output devices it is meant to create. If this is a dry run, the following
# function will throw an exception.
devices = evmap.init_devices()
device_in: evdev.InputDevice = devices.get(keyboard_in)
device_out: evdev.UInput = devices.get(keyboard_out)

# Unless this script needs to create additional communication channels, just having
# the output devices existing should be enough to tell systemd that this script is
# ready and the next one can be started already.
evmap.announce_ready()

# After this point, the evmap library becomes wholly unnecessary and you only need
# to use python-evdev henceforth.
async def map_keyboard(input: evdev.InputDevice, output: evdev.UInput):
    # Your program goes here.
    pass

asyncio.run(map_keyboard(device_in, device_out))

The major issue is that this is a house of cards which collapses whenever (1) one script in the chain crashes, (2) a script in the chain restarts, (3) the configuration changes, (4) a device gets detached, and (5) potentially collapses whenever a new device gets attached.

KarsMulder commented 10 months ago

I feel like this "evmap" library is not the way to go. Too fragile. Simultaneously too big and too small.

I'm starting to feel like we're just unnecessarily dancing through hoops because Wayland refuses to provide some event mapping protocol.

So many ways to go about this. I need more time to think.

pallaswept commented 10 months ago

I think that it is ridiculous that a Linux user should be expected to jump through the hoops of having to write systemd unit files just to make something work that works pretty much automatically on Windows. Not all disabled people are programmers.

QFT. And for what it's worth, another guy I know who needs these same capabilities is totally able-bodied; he just has a crazy racecar and plane cockpit simulator with more input devices than I could count., that all need to behave differently depending on the game he's playing.

You're right, it needs to be simpler. It should all just be 'normal'.

KarsMulder commented 10 months ago

You know, so far we've been focused on mapping event devices because event devices are the one thing that we actually can map under Wayland, but we've been forgetting that there are actually multiple layers to input, and the latter layers may be just as important for macro's or disabled people:

Device layer First of all, there is the device layer, which represents the state of the physical devices. Currently, evdev is used to report that state to userspace. This layer should merely concern itself with whether a certain keyboard key is down or not, or what the current axis is of a certain joystick.

The kernel makes it possible for us to map the device layer, so we're trying to archive our goals on the device layer, even though for many purpose this may not actually be the right layer to act on.

Input layer Then, there is the input layer. This is the layer that translates a key state moving from up to down into a "key press" event, that translates the joystick moving out of its dead zone as a "axis move" event, that scales mouse movement based on its acceleration profile, and interprets gestures made on a touchpad. This tends to be handled by libinput.

Although simple cases like key presses appear to map 1:1 between the device and input layer, some more advanced things like generating a gesture would be something that is wholly inapropriate to implement on the device layer.

IME Layer ("Input Method Editor") Finally, there is the IME layer, which concerns itself not so much with which input actions were taken, but rather with how they modify the state of the GUI, particularly text fields. This layer knows where on the screen the text cursor is, what the content of said textbox is, and can modify the content of that textbox directly.

IME's are mainly used for some foreign languages like Chinese and Japanese that contain thousands of characters, where the standard way of input is to type a sentence phonetically and then have the IME use "smart" analysis (in quotes because all IME's on Linux are dumb as rocks) to figure out which characters should've been used all along. A Japanese user might type "いしのうえにもさんねん", press the convert key, and then have the whole sentence tranformed into "石の上にも三年".

Even for English, I suppose that adjusting the IME layer may be helpful for features like AHK's Hotstrings (something that automatically replaces a word you type with another word), or for voice-based disability assistants that allow you to use voice commands to move the cursor to particular positions in text or modify it in ways that are not just inserting more words, like saying "delete line" to remove however many characters happen to be in the current line.

As you can see on the table on the Archwiki, the current situation for IME's is very much nonstandardised, though it does seem that there is an unstable Wayland protocol being drafted.

The point of the above is that there are multiple layers to the input system, and if we want a well-oiled input mapping&macro system to help disabled users, just being able to modify the lowest layer may not be sufficient. Ideally we would have a way to insert a hook at any point in the chain "device layer → input layer → IME layer" as well as before and after other hooks, but many of us ended up focusing only on the device layer because the Linux kernel is being cooperative to mapping.

I'm not going to suggest any solution for now because I really should get myself to understand the whole input stack better before I can make any reasonable proposal. Reading up on the whole input stack may take me quite a while though.

sezanzeb commented 10 months ago

That shouldn't be a problem. Say 'mapper 2' is set to run After 'mapper 1', a newly added 'mapper 3' set to run Before 'mapper 2' and After 'Mapper 1', or even changing 'mapper 2' to run After 'mapper 3', provided 'Mapper 3' is also set to run After 'Mapper 1', shouldn't interfere with the running of 'mapper 1' and '2' at all. SystemD does a pretty good job of sorting out all these priorities, even on the fly.

But service files usually come as part of the package in the repo, so it might break every time the tool is reinstalled. Furthermore, I don't think the Year of the Linux Desktop will ever be a reality if people are required to edit config files. Some people are already used to it I guess, but it would be nice if this could be controlled via the gui in the mapping tools. They can check which layers are free, right? So they can offer an input to rearrange the mapping-tool.

the difficulty in migrating existing tools to this new implementation

python-evdev would have to be extended with EVIOCSHADOW, and I have never looked at the c code at all. I can't even really write c. Other than that, as long as we can still create our custom uinput devices (for fake-gamepads and such) it should be fine. All I need to ensure is, that forwarded events go into the next layer instead of a new uinput.

I haven't read into the IME layer yet and I have never heard of it. My knowledge has been that mapping to characters that are not part of the keyboard layout is difficult, and it requires editing the keyboard layout and has often been a pain point for users. I don't know if keyboard layouts can be edited in wayland at all. Aren't xkb files used by wayland as well? Configuring those is a nightmare and not a viable alternative.

Since you have been talking about keeping events as slim and small as possible, I guess this isn't a good idea, but it would be nice if events could contain an unicode code as an alternative to (EV_KEY, KEY_A, 1).

If mapping happened within wayland, existing tools require bigger redesigns obviously, and it wouldn't be compatible with X (does compatibility with X matter anymore?). Also, wayland was specifically designed to stop keyloggers afaik (and therefore mapping scripts wouldn't be possible), though there exists this: https://github.com/Aishou/wayland-keylogger. I don't know what the current state is in that regard.

In X mapping a mouse-button to shift (via setxkbmap) is problematic, because you couldn't use it to modify keys on your keyboard. You would have to use the shift key on the keyboard for that. I wonder if that would be a problem in wayland as well.

sezanzeb commented 10 months ago

Another often requested feature is changing mappings based on the active application, which is also not possible in wayland without relying on custom dbus messages from gnome or something.

Mapping scripts therefore need to have some privileged access to wayland internals. Unfortunately my knowledge of the desktop stack is far too limited for me to comment on the viability of that.

kermitfrog commented 10 months ago

Sooooo much text - I'll comment on it in order of appearance...

Maybe evdev can be compiled as a module and then replaced by a patched version

I might be wrong about this but I think it should be possible to block the init call to the built-in evdev module, and load a dynamically loaded module over the top of it.... But still, it would be much nicer to just get this mainlined.

Hm.. maybe it can be disabled with a kernel parameter? If so: what if there are modules that depend on it? Ok, maybe it can, but I'm not sure how complicated it will be - but the evdev developers will surely know.

If we want to uphold the tenet that it should not matter to the rest of the system whether a device has been shadowed or not, then the only sane position to place a new mapper script is at the end of the chain, because that is the only position where the shadow device has exactly the same state as the opened keyboard-1 had. If it were inserted at another position in the chain, it might suddenly observe that, without any event getting sent inbetween, the caps-lock key has been pressed and the ctrl key has been released.

Ok, a key getting transformed between press and release is a problem, but I feel crafting a complex solution to avoid that potential problem might just not be worth the trouble...

Inserting (or removing) a new tool in the chain is a rare event which I'd expect only to happen in a few cases:

change of configuration -- this should only be triggered manually by the user
(re)start or shutdown of the tool -- manually by the user or on startup
device is (dis)connected -- manually by the user
a tool crashed -- this is bad and we should expect key states to be messed up anyway

If the user triggers it, I believe they will do it by releasing keys in a state that won't cause these problems. Or at least they might learn to do that after a while.. Ok, thats not a solution, but then, EVIOCSHADOW|layers isn't the main cause of such errors anyway. Let's consider the following scenarios, where the user will use a shortcut containing Meta to switch to a different application:

An application (let's say an RDP-client) grabs the keyboard so global shortcuts can be used on the remote computer. The user switches to this application, then later switches to a different window using the mouse.
The user uses an input mapper and has Meta configured to Ctrl in the application they are switching to. The current preset is changed before Meta (now Ctrl) is released.

In both cases Meta release is never seen by X/Wayland and I believe such things will happen more often than a change in the chain. It's annoying, but people deal with it (e.g. press and release Meta again or switch to a tty and back to make X reset state).

So, I wonder: if this is a very rare case, might it not be more practical to provide some way to trigger a full input reset (e.g. by very low level shortcut) that will send a signal to all tools listening to any input device (including X/Wayland - and by that solving the above problems as well) to reset key state (and in some tools: variables)? Of course this has to be communicated well (e.g. "If things get weird, try pressing Ctrl+Alt+Pause" in the documentation of every input-related tool).

Writing events to physical devices can already be done Although the documentation suggests that you can only write events to uinput devices (e.g. the corresponding libevdev function is called libevdev_uinput_write_event, it is actually possible to write events to physical event devices:

Yes, but I'm not sure we should..

More than a decade ago I was using an input remapping tool called gizmod which, as far as I remember, generated events by writing to the real evdev device files. At some point it stopped working with newer kernels, which made me write inputMangler..

Before that it worked mostly fine.. mostly.. It's too long ago to remember what exactly did not work, but I'm sure there were problems with some applications. Maybe these were related to to the reason it stopped working - which, if I understood correctly, was because it relied on some kernel bug related to polling evdev devices, which was eventually fixed. Maybe whatever caused them is simply not a problem now.. In any case these memories makes me sceptical about that whole write-to-existing-evdev-devices thing.

A new perspective: how about filtering devices instead of shadowing them?

The only functional difference I see compared to EVIOCSHADOW with priorities is that a client could listen to events that happen before a particular input mappper transforms them. Am I missing something?

In addition to the main kernel mailing list, the kernel also has various smaller mailing lists for more specific topics. I think that the linux-input mailing list would be the right place to post these ideas.

I agree.

With the layer approach, we're basically asking the kernel to orchestrate how events should flow from mapper script to mapper script. But maybe if we created:

A standard configuration format and location that tells you in which order event mapping scripts should start, which input devices they should take, which output devices they should generate;

A CLI/GUI tool that allows the user to configure in which order they want to start their tools/scripts, and then automatically generates the configuration files for (1) along with the systemd configuration files, and can restart the whole house of cards when the configuration changes;

A Python/Rust/C library which can read the format of (1) and automatically open the right input devices and create the right output devices.

But we must also consider the case where a user might want one device processed by tool A first and B later but the order B, then A for another device. EVIOCSHADOW + priority would allow this, having the order configured by systemd might make that difficult. On the other hand: if we allow that case, if might cause an infinite event loop.

I also fear that having to register each tool with the system will make running simple scripts (e.g. using python-evdev) very complicated.

Input Layers .. The point of the above is that there are multiple layers to the input system, and if we want a well-oiled input mapping&macro system to help disabled users, just being able to modify the lowest layer may not be sufficient. Ideally we would have a way to insert a hook at any point in the chain "device layer → input layer → IME layer" as well as before and after other hooks, but many of us ended up focusing only on the device layer because the Linux kernel is being cooperative to mapping.

It might be nice - but we need to be careful. If one event were to trigger another event in a layer thats below it's own we might end up in an endless loop.

I haven't read into the IME layer yet and I have never heard of it. My knowledge has been that mapping to characters that are not part of the keyboard layout is difficult, and it requires editing the keyboard layout and has often been a pain point for users. I don't know if keyboard layouts can be edited in wayland at all. Aren't xkb files used by wayland as well? Configuring those is a nightmare and not a viable alternative.

Yes, xkb is used in Wayland as well and I completely agree that editing the keymaps is a nightmare.

Since you have been talking about keeping events as slim and small as possible, I guess this isn't a good idea, but it would be nice if events could contain an unicode code as an alternative to (EV_KEY, KEY_A, 1).

The device layer doesn't even know the concept of characters. XKB does - so this type of event needs to be injected at it's layer (input?).

sezanzeb commented 10 months ago

The device layer doesn't even know the concept of characters. XKB does - so this type of event needs to be injected at it's layer (input?).

Wayland (though I believe that KDE and Gnome are responsible for keyboard layouts, but I probably won't find the source for that information anymore) would have to recognize that a unicode character is included in the event payload and use that, instead of looking into the keymap.

Considering the issue with special characters and application-preset-switching, I doubt that a solution that only works via evdev will be really satisfying for everyone. Would it make sense to open an issue on their repo and ask them what they think? https://gitlab.freedesktop.org/wayland/wayland

pallaswept commented 10 months ago

It seems like no matter what we talk about here, the punchline ends up either being "we need the kernel to do it " or "we need Wayland to do it" (which specifically seems terrible because that also implies we need every single Wayland compositor to do it, too... it'll be YEARS before we see any kind of solution).

Long story short, it seems like we need to escalate this no matter what, and I think the kernel is the place to go, because Wayland is definitely not.

KarsMulder commented 10 months ago

While there is truth in the saying "don't let the perfect get in the way of the good", I would prefer to use this opportunity to properly solve this problem, and right now I too think that we need to get Wayland support for it.

Getting a new Kernel API would certainly solve issues regarding to two evdev mapper scripts working together, but there are some problems that we're simply not getting solved on an evdev level:

Active-window-dependent mappings;
Mapping to keys not on the keyboard;
AHK-style hotstrings;
Gestures;

And probably some more. Frankly, if we intend to keep our Wayland keymappers working on evdev level, we will never get close to the amount of input customization that Windows offers.

@pallaswept, if you don't mind, could you give some examples of fancy keymapping stuff you get done on Windows that's hard to accomplish under Linux/Wayland? To whomever we escalate this, I think "disabled people need these features to properly use Linux" would go a long way to convincing them.

I'm not even sure why Wayland does not support keymapping yet. I mean, yes, I get the security issues, but I'm sure we can find a solution for that even if that solution is "until we find a better authentication mechanism, only programs running as root may register keymappers." I think it is more of a "nobody found it necessary enough yet to actually spend time drafting a protocol and reference implementation" than a "we conceptually hate keymappers."

Looking up the previous keymapping-related discussion on the Wayland communication channels sometime is on my to-do list.

For those who are not familar with it: Wayland is not a single monolithic protocol that requires all compositor's collaboration to change. Wayland consists out of a few core protocols that all compositors tend to implement, plus a variety of additional protocols that are implemented by only one or a few compositors.

While ideally we'd have all compositors implement a new fancy key-mapping protocol, I think we could get far if we could write up some solid use cases that require a new protocol, draft a new protocol, write a reference implementation that is easy for composers to integrate, and then convince one compositor to actually integrate the new protocol.

Interesting tidbit: KDE even has a protocol that allows you to create fake input events. It would be possible to create some kind of input daemon that grabs all /dev/input/event* devices like @sezanzeb suggested, exposes an interface that allows third-party tools to map them, and then directly output libinput-style events to Kwin. This protocol itself however does not include a way to suppress events and does not supply a way for multiple keymapping tools to chain into each other, and even says that the compositor is free to ignore the generated keys (=> total loss of input if all evdev devices are grabbed), so I don't think this is the ideal protocol we should aim to standardize.

Anyway, I am currently suffering from impostor syndrome and I will not be writing any proposals until I understand Wayland and its input stack better. I won't have much time to read up on it until the weekend though.

pallaswept commented 10 months ago

@pallaswept, if you don't mind, could you give some examples of fancy keymapping stuff you get done on Windows that's hard to accomplish under Linux/Wayland? To whomever we escalate this, I think "disabled people need these features to properly use Linux" would go a long way to convincing them.

As much as I try not to abuse my disability as a 'poor me' routine, that kinda is the reality of it, so I'm fine with that. Most of what I need to do isn't particularly fancy, but it does demand a lot of flexibility.

Like let's use a real example, something common like ctrl+scroll to zoom, I might need to be able to map a mouse side button to ctrl (at present, I'm doing that with the firmware of the mouse, which requires a windows VM to run the proprietary configuration software, and the mouse has a USB HID interface of a mouse and also of a keyboard, so the mouse's keyboard interface outputs the ctrl key... but having the need to have a windows VM to make linux work is a PITA and right now I don't even have one, I set up the mouse on my carer's (who lives with me) PC who was kind enough to allow me to install my mouse's software on it... but just doing that walk across the house and sit at a strange desk makes it an all-day job to do something like this, plus another day or two to physically recover). I need to do that because my ability to use both hands is not something I can rely on, so I might only be able to use the mouse, or the keyboard, but not both.

So now we're just trying to crtl+scroll but maybe I can't use my middle finger so I might need to use ctrl+plus, and my whole right hand might be out of action, so I might need the ctrl key bound to a footswitch and use the plus key with my good hand, or maybe my left hand is out of action today ans so is my right thumb, so I need to use the footswitch for ctrl and the mouse wheel to scroll or plus key, to get my zoom.

But this kind of thing is super annoying because now let's say I'm trying to use scrcpy to control my phone (since picking it up and holding it to my head is difficult, I love the ability to remote control it over the PC, and I can use pipewire's bluetooth functionality to use the PC as a speakerphone/headset unit for the phone) but the mouse button I have hard-mapped to the ctrl key, that's the equivalent of the app switch button on the phone, which means I need to use the keyboard shortcut ctrl+s to get the same function, but if my left hand is giving me trouble, that isn't really doable, and reaching across my body to hit that combo with my right hand might just wreck my right shoulder for a few days.

So all of this story is meant to demonstrate that I need to be able to map all kinds of things to the ctrl key, but dynamically, as in sometimes I want to map it, sometimes I don't, and I want to be able to toggle it on and off on the fly; and the ctrl key needs to be able to come from any number of devices (footswitch, mouse, etc, all of which would be re-mapped eg the F24 key from the footswitch (also configured with a windows VM, hence the rarely-used F24 key mapping) or the 'history forward' front-side button on the mouse)

Then there's more complex remapping like, what if I am minus my left hand, I need ctrl,alt,shift, and super keys of course, so I could map them to the mouse side buttons (forward = ctrl, back = shift, forward+back = super, back+forward = alt) and I might dynamically map the middle-click to the enter key with forward+middle-click = enter, so I can do a lot of stuff without my hand ever leaving the mouse.

All this kind of stuff is a snap with something like AutoHotKey on windows, but on linux, even a lot of it in X, it's just not doable.... And regardless, I can only use Wayland because of the way it handles multiple displays with different refresh rates, something that both X and Windows' display manager do poorly - they composite the entire desktop at the frame rate of the highest refresh rate of the attached monitors (windows used to use the refresh rate of the 'primary' monitor but they recently changed it), meaning that the lower refresh rate monitor either skips frames, or tears, and for me, that means migraines. And my migraines aren't just bad headaches, they're blinding, crippling, paralysing things like a minor stroke. That being said, the same visual artifacts that cause me migraines, for others, cause epileptic seizures.... so I consider myself lucky. So obviously I have a pretty strong love/hate relationship with Wayland - on the one hand, it's sparing me a lot of nasty migraines, on the other, it's doing permanent damage to my body because I can't get my input devices to remap like I used to. And I could get around it with a combination of some existing tools like keyd and input-mapper and ydotool, but I can't combine most of them because of the exclusive evdev grabbing they use.

I'm a pretty unwell individual but there are a lot of people a lot worse off than I am, too, and I know they have similar needs. I talked about this a little with my neighbour, who is bound to a wheelchair with limited movement to her upper body, as well... I was talking about building her some embedded devices to assist her with controlling her electronics, but she's pretty much in a place where if she can't do it on a touchscreen tablet she can't do it. And as I mentioned earlier, even able-bodied people have a need for stuff like this, as their input devices get complex, like my friend with his cockpit simulators and a gajillion USB input devices that all need to behave differently depending on the game he wants to play. It's definitely a thing we all need.

KarsMulder commented 10 months ago

So I've done a bit of reading up. I've diven just deep enough to realize that there is a whole lot more to dive into.

I still need to get to the bottom of how Wayland/libinput "seats" exactly work. And the rest of libinput for that matter.
My IME (fcitx5) appears to work just fine under Wayland (KDE). How?
I need to investigate which accessibility features GTK and Qt make available.
I need to read up on previous communication regarding input mappers on Wayland.
I wonder which API's Windows programs use to work together seamlessly?

... so this is still going to take me a while. I suppose that designing a solid protocol is not something you can get done in just a few days of work.

In the meanwhile, I've written up some of my thoughts.

KarsMulder commented 10 months ago

Input mapping approaches

As pallaswept demonstrated, it is necessary for other mapper scripts to observe events generated by other mapper scripts. Because of this, I am going to assume that the kind of model we want involves a set of mapper scripts with a given ordering, and the output of the previous mapper script is the input for the next mapper script.

For simplicity, I am going to pretend that the mapper scripts have a linear ordering, i.e. the first script feeds into the second which feeds into the third and so on. This may not end up being the final model (e.g. a mapper script for a keyboard may not need to have its output fed into a joystic mapper script), but it makes drawing images easier. Just pretend that this hypothetical joystick mapper script just passes on the keyboard events unmodified.

Wayland protocol-level mapping With that in mind, I think that there are two ways to go about this. The first one is an extension of the Wayland protocol that allows programs to announce that they want to map events, and the Wayland compositor is in charge of routing the events through the mapper scripts. The compositor should also inform the mapper scripts what the currently active window is, and whatever other information it wants to share:

wayland_extension

Input Daemon The second approach is to add a Wayland protocol extension that allows you to tell the compositor that the compositor is no longer in charge of turning evdev events into input events, and but that a new daemon is in charge of doing so. The compositor should inform the daemon which window is currently active etc.

daemon_extension

The main difference is that in the first example, the Wayland daemon is in charge of reading events from evdev devices and processing them with libinput. In the second example, the libinput part of the Wayland compositor get essentially deactivated, and turning evdev events into libinput events becomes the task of the input daemon.

The input daemon may then offer a new interface that various mapper scripts can use. Systemd should automatically start the input daemon when a mapper script needs it.

In comparison The advantages of the first (Wayland protocol-level mapping) approach are:

Avoids one layer of IPC (Inter Process Communication) latency between the Input Daemon and the Compositor;
Adds a standardised way of using multiple mapper scripts on the same system;
Less bloatware;

Its main disadvantage is that the current event model that the Wayland protocol exposes to applications is not very suitable for mapping. We would need to add a new event model to the Wayland protocol and may need to hook into lower levels of the libinput library. The Wayland compositor may also end up becoming in charge of reordering the mapper scripts if their startup order differs from the order of the chain, figuring out how to restore a consistent view of the input if one script crashes, etc.

It also means that there is one IPC worth of latency added for each script that is in use.

The advantages of the second (Input Daemon) approach are:

Moves the implementation details of what the event model should be out of the compositor, allowing it to more easily expose many intermediate layers of event processing;
Potentially much faster event processing per mapper script (explained below);

The big disadvantage is that mapper scripts can only run simultaneously if they're written for the same input daemon (or the different input daemons at least offer compatible interfaces.)

Shared-object based mappers for performance and modularity? We could make the Input Daemon open a pipe to communicate with each mapper script that has been launched, which would obtain about 0.3ms of latency for each efficiently implemented mapper script. However, maybe we could (in addition pipe-communication) also make it possible for mapper scripts written in C/C++/Rust to be compiled as shared objects instead, and then have Input Daemon load those shared objects into its own memory space. When working with scripts written as shared objects, the latency overhead per script would be measured in nanoseconds, making it no problem to have hundreds of small scripts running at once.

The advantage of the shared object-approach would be that we could very easily split input processing in many layers without having to worry about the latency it carries, which makes it very customizable. For example, this is what the default keyboard configuration screen looks like right now in KDE Plasma:

KDEConfig

Instead of needing to have a single program (the KDE compositor in this case) apply all configuration options, it would be possible for each of those options to become a tiny "mapper script" in the form of a shared object. We could process like a hundred active mapper scripts in a microsecond, so no performance worries. This kind of modularity would make it easy for third parties to write additional configuration options that could get automatically included in the configuration screen if they are installed, and would make it easier for big mapper scripts to say exactly which point in the mapping chain they want to hook into.

The disadvantage of the shared object approach (compared to the pipe approach) would be (1) the shared-object based mappers are harder to write than normal programs, e.g. they need to make sure they can clean up all memory they allocate, and (2) if a single mapper crashes, the whole input daemon crashes.

Haven writen all of that, I think that the first approach (Wayland protocol-level mapping) is the better way to go. The input daemon's approach has serious risk of fragmenting the ecosystem, and I am not sure if the reduced latency by using shared objects is worth the risk of the whole input system crashing.

However, that does mean that we still need to figure out how we want the Wayland protocol to communicate events to the mapper, because the way it communicates them to programs just won't do; it has already merged events from different sources into a single source, which is troublesome e.g. when you want two footbuttons to do different things.

KarsMulder commented 10 months ago

Mapping to keys not on the keyboard

The situation regarding mapping to keys that are not part of the layout is not great, but better than desparate. The Wayland protocol passes scancodes and a keymap (in XKB format) over to the clients. The clients on their turn are expected to individually link to libxkbcommon to turn those scancodes into characters.

This means that at Wayland's input protocol level, we cannot decide that we just send a key あ to a client if that key is not on the used keymap.

Workaround We could set up our new protocol such that it can modify not only the events, but that it can also announce that it wants applications to use a different keymap than the one that has been configured on the system. Each mapper script should receive the current keymap in use, and get the option to announce that applications, and mapper scripts later in the chain, should use a different keymap:

keymap

Wayland protocol does allow the compositor to inform applications that the keymap has changed, so there should be no issue if a well-behaved application was already running before a mapper script launched and changed the keymap. We could provide a library for aspiring keymap programs which exposes functions like "read this keymap, add a character あ corresponding to a currently unused scancode, and then serialize the result as a new XKB file". We then need the Wayland compositor to inform all programs that a new keymap is to be used henceforth.

(I'm not sure how difficult that would be. It may or may not be a nontrivial task.)

The good news is that the keymap file can just be stored in shared memory, it doesn't need to be saved to the harddisk. The bad news is that the only syscall I can find that actually creates that shared memory, also creates a link on the filesystem refering to it as "courtesy". That link can be deleted immediately, but it is a security hole.

In case multiple keymappers want to modify the keymap, it should work fine in most cases: Mapper A can associate character あ with the previously-unused scancode 250. Then Mapper B receives the keymap generated by Mapper A and generates another new keymap which further associates character い with scancode 251.

By the way, it appears that both libxkbcommon and the Wayland protocol use 32-bit scancodes, which gives us quite some space to work with. You do need to declare the highest scancode you intend to use in the XKB keymap file though, and the current libxkbcommon implementation will allocate memory based on the maximum scancode you declared. Declaring a maximum scancode higher than necessary will lead to performance regressions.

Issues This does mean that each mapper should know ahead of time which keys it could possibly desire to generate. Even though we do have as many scancodes available as we need, we probably don't want to create a keymap that contains all 1,112,064 unicode code points. If a keymapper changes its mind later (e.g. because a new keyboard got attached) then it could issue a "change keymap" command. This situation is significantly better than the situation of evdev capabilities.

If we want keymapper scripts that start later to hook themselves into the chain before other already-running keymapper scripts, there could be conflicts in some cases when the later-launched early-chain script decides to use the same keycode that an earlier-launced late-chain script had already used.

Suppose Mapper B starts first and decides to modify the keymap to associate scancode 250 with い. Next, the user starts Mapper A and wants it to put it before Mapper B in the chain; e.g. if Mapper A were to generate a Ctrl key, then Mapper B should see that key.

The new Mapper A does not see the modified map of Mapper B because it is positioned before Mapper B in the chain, and decides to associate scancode 250 with あ. Mapper B is subsequently notified that the keyboard map has been updated, but can no longer associate scancode 250 with い like it used to—what should Mapper B do?

Surely Mapper B does have solutions to this problem, but such things need to be taken into consideration when writing the XKB-layout-modifier library.