kermitfrog / inputmangler

Inputmangler is a daemon that intercepts and transforms linux input events, depending on the active window. It aims to be highly configurable, reliable and not too hard to use. You can use it to remap those extra mouse buttons, properly utilize a footswitch or even remap your second keyboard to trigger all kinds of shortcuts! If you are left-handed, you can switch left and right mouse buttons for applications that ignore your Desktops settings (like dosbox and fs-uae). And you can have a different configuration for each window! It is also capable of translating Text from the network to key presses.
Other
5 stars 1 forks source link

UInput Orchestrator #5

Open kermitfrog opened 2 months ago

kermitfrog commented 2 months ago

UInput Orchestrator (UIO)

What is the idea?

A daemon that manages connections between mappers by creating, assigning and re-assigning multiple uinput devices, but could be extended to something else later. This should result in a stable path where multiple input mappers can process an input device in a deterministic order. Mappers connect through a few functions in a library and can request file descriptors to matching (virtual) input devices. UIO's job is to ensure that

Disclaimer: this is an early draft and I have not done enough research to be sure that it is technically feasible (or even possible).

Let's start with a few diagrams.

The first is about order order

Each Mapper has contexts, identified by it's path (abbreviated here to M#) and role. Roles can be requested by the mapper or configured by the user. If configured by the user, the mapper can request available roles from UIO. If a mapper wants to create a context, a GUI asks the user to confirm. "New" is where new contexts pop up by default. All contexts are specific to an input device (although input devices could be grouped for easier configuration).

Startup and first events

init

I hope this is somewhat clear..

UIO makes sure there is a chain of (for now!) uinput devices. It can open and create input devices, then shares the FDs with mappers through UIOInput/UIOOutput. Virtual devices can be kept open as it deems necessary (e.g. for short lived scripts / a short while after a mapper exits, in case it's just restarting..).

I hope it is possible to manage access rights to the virtual devices in a safe and stable way.

UIOInput and UIOOutput offer transparent read/write functions. My plan is to use uinput for now, but this may be extended and configured to support other ways of communication between mappers like a direct shared buffer (for performance) or one that is managed by UIO and keeps state of all keys (lower performance, but safer handling of some cases). read_evdev() means that it returns the event as evdev would. We could add transformations to libinput structs, etc. later.

Window change

window_update

We may have options to handle cases where a keycode changes while it's pressed. But I'm not sure how/where to do that yet.

Advantages

Disadvantages (for now)

Some open questions

kermitfrog commented 2 months ago

Reserved

pallaswept commented 2 months ago

This seems so simple and intuitive as a solution. As you mentioned, the only immediate concern is that it might require long chains of virtual devices. Honestly I'd say that any inherent performance issue with that is probably unintentional.

If I were a kernel dev and someone asked me if there were performance issues, I'd probably say, "I don't know, are there performance issues?" :laughing: I guess we'll probably have to try it and find out, to find out.... but opening a simple, static chain of uinput devices and passing between them should hopefully be fairly simple. I'm actually thinking that interception might be able to do it, out of the box?

kermitfrog commented 2 months ago

I had a look at interception. It seems to do some of those things, but not everything. The biggest difference is that interception starts processes and pipes them together, which has some limitations/problems, e.g.:

But their udevmon code might prove valueble in order to understand the udev APIs :) -- I don't find the official docs very helpful.

pallaswept commented 2 months ago

I had a look at interception. It seems to do some of those things, but not everything.

Sorry, what I meant was that, Interception might be useful to test the effect of having many uinput devices open... as in, maybe Interception can help to answer your question:

May create a lot of virtual input devices. Is this bad for performance?

I agree, it would be too limited to reach your intended goal.

KarsMulder commented 2 months ago

I suppose that the biggest problem that needs to be solved is indeed making several input mappers use each other output, in a way that does not require the end user to manually configure input and output devices for every single mapper they use.

A daemon which a mapper could ask "I want to map keyboard devices to keyboard devices. Give me the input and output devices I should use." would indeed solve that problem. Without any configuration on the user's side, the daemon could ensure that each mapper get put on a single deterministic part of the chain, and if the user doesn't like the order the daemon automatically choose, then they can reorder it easily in a single GUI written for the UIO daemon, without having to reconfigure each mapper manually.

That shifts the task from convincing the Wayland crew from using a new protocol to:

  1. Convincing the developers of the mapper scripts to add support for getting their input devices and output devices from the UIO daemon (if available);
  2. Convincing distributions to add a package containing the UIO daemon to their repository.

Which may potentially be easier, but it really depends on how willing the majority of the input mapper developers are to go along with it.

Does not require big changes to existing mappers (I hope).

I do have several thoughts regarding whether it is possible to create a sufficiently transparent wrapper like UIOInput that does not require big changes to existing mappers, but no coherent conclusion regarding that yet.

Currently my biggest worry is how this is going to affect the event loop: on a low level, mappers would now need to maintain an open communication channel with the UIOInput daemon (whether over D-Bus or a Unix socket) and may occassionally need to change which event devices they have open, and thus change which file descriptors they poll/epoll. I think that abstracting that away would significantly decrease performance, requiring the high-performance oriented mappers to do some nontrivial plumbing around their event loop. But I'm not wholly sure of that yet. There are many options to consider here.

how to handle crashing mappers? can they kill UIO?

I think that UIO should be designed such that a crashing mapper cannot crash UIO.

I think it would be acceptable for a crashing mapper to crash UIO if mappers were written as shared objects (.so) that are dynamically loaded into UIO's memory space, kind of like a kernel module getting loaded into the kernel. That would greatly increase performance at the cost of making mappers harder to write and allowing one of them to bring down the whole house of cards.

As long as we do not make the tradeoff of allowing mappers to enter UIO's memory space, crashing mappers should not crash UIO.

May create a lot of virtual input devices. Is this bad for performance?

I've written a small benchmark with python-evdev to check how fast my program evsieve can grab and mirror an input device 750 times:

#!/usr/bin/env python3

import asyncio
import evdev
import evdev.ecodes as e
import os
import subprocess as sp
import time

ALPHABET = list("abcdefghijklmnopqrstuvwxyz")
NUM_KEYS_TO_SEND = 200
TIME_BETWEEN_KEYS = 0.1

# Create a device that we will send events into.
capabilities = {
    e.EV_KEY: [
        e.ecodes["KEY_" + key.upper()]
        for key in ALPHABET
    ]
}

input_device = evdev.UInput(capabilities, name="virtual-keyboard")
INPUT_DEVICE_SYMLINK = "/dev/input/by-id/benchmark-0"
if os.path.islink("/dev/input/by-id/benchmark-0"):
    os.unlink(INPUT_DEVICE_SYMLINK)
sp.run(["ln", "-s", "--", input_device.device, INPUT_DEVICE_SYMLINK])

# Creates one layer that clones the previous layer's input device.
def create_layer(index: int):
    input_path = f"/dev/input/by-id/benchmark-{index}"
    output_path = f"/dev/input/by-id/benchmark-{index+1}"
    args = ["systemd-run", "--service-type=notify", "--collect", "evsieve"]
    args += ["--input", "grab", "persist=exit", input_path]
    args += ["--output", f"create-link={output_path}"]
    sp.run(args)

# Create all layers.
NUM_LAYERS = 750
for i in range(NUM_LAYERS):
    print(f"Creating device {i+1}/{NUM_LAYERS}")
    create_layer(i)

# Then open the device created by the last layer.
output_device = evdev.InputDevice(f"/dev/input/by-id/benchmark-{NUM_LAYERS}")
output_device.grab()

# Sends events to the input device, then closes the input device when done.
async def send_events_then_close(device):
    timestamps_of_sending_events = []

    for event_index in range(NUM_KEYS_TO_SEND):
        keycode = e.ecodes[f"KEY_{ALPHABET[event_index%len(ALPHABET)].upper()}"]

        timestamps_of_sending_events.append(time.time())
        device.write(e.EV_KEY, keycode, 1)
        device.syn()
        await asyncio.sleep(TIME_BETWEEN_KEYS / 2)

        timestamps_of_sending_events.append(time.time())
        device.write(e.EV_KEY, keycode, 0)
        device.syn()
        await asyncio.sleep(TIME_BETWEEN_KEYS / 2)

    # Give the other tasks some time to finish reading events before we exit.
    await asyncio.sleep(1.0)
    device.close()

    return timestamps_of_sending_events

# Measure the time of which the events that we can observe from the event devices.
async def read_events(device):
    timestamps_of_reading_events = []

    try:
        async for event in device.async_read_loop():
            if event.type == e.EV_KEY:
                timestamps_of_reading_events.append(time.time())
    except OSError:
        return timestamps_of_reading_events

# Tell the user what the average difference between the input and output events is.
def present_report(timestamps_in, timestamps_out):
    total_delta = 0
    count = 0
    assert(len(timestamps_in) == len(timestamps_out))

    # Measure the total difference between the time at which we wrote events to the input device
    # and the time the event showed up at the output device after being mapped through NUM_LAYERS
    # amount of layers.
    for time_in, time_out in zip(timestamps_in, timestamps_out):
        total_delta += (time_out - time_in)
        count += 1

    MICROSECONDS_PER_SECOND = 1000000
    print("")
    print(f"Average delay of {round(total_delta/count/NUM_LAYERS * MICROSECONDS_PER_SECOND * 10)/10} microseconds per layer per event over {count} events and {NUM_LAYERS} layers.")

async def main():
    timestaps_in, timestamps_out = await asyncio.gather(
        send_events_then_close(input_device),
        read_events(output_device),
    )
    present_report(timestaps_in, timestamps_out)

asyncio.run(main())

On my system, it outputs

Average delay of 41.5 microseconds per layer per event over 400 events and 750 layers.

There does not appear to be any worse-than-linear scaling involved as the chain of input devices becomes longer. At least, for the purpose of event latency. Maybe some other programs are poorly equipped to handle a large number of input devices. For example, libinput will probably need to open every single input device even if most of them are grabbed. The epoll syscall can read events from any number of devices in an O(1) amount of time, so an efficient program that uses epoll shouldn't be slowed down by having additional devices that do not actually generate events other than by the one-time cost of opening them all.

Also, another thing I ran into: there was a limit to how many layers I could use in the above benchmark. Specifically, 776 layers was the maximum my system could handle. I'm not sure why that specific number. It does seem to be possible to create more UInput devices than said arbitrary limit, but those devices do not show up under /dev/input/event*, and as such are practically invisible to the rest of the system.

Based on ls /dev/input, the event device numbers only go up to /dev/input/event1023. I don't know why the maximum amount of layers my system could handle was 776 instead of ~1000.

A maximum of ~1024 event devices is not an unreachable cap, but still one that will in practice probably not be met that often. Maybe the cap is arbitrary and could be raised by the kernel devs if there is a need to, or maybe there are more fundamental reasons for the cap like a limited amount of device node numbers in some POSIX standard.