splondike / wayland-accessibility-notes

Research on accessibility in the Wayland based Linux desktop
Other
22 stars 0 forks source link

Talon requirements: Why is the application PID required? #1

Open mwcampbell opened 11 months ago

mwcampbell commented 11 months ago

I noticed that a high-priority requirement for Talon was a way to get the PID of the application. Why is this needed? Might there be another way to accomplish the same thing?

splondike commented 11 months ago

Talon would be using a window's PID mainly to determine the associated executable owning the window. This lets us change the set of active voice commands depending on the focused app. Consider the following two files written in Talonscript:

app.exe: gedit
-
paste that: key(ctrl-v)
app.exe: emacs
-
paste that: key(ctrl-x ctrl-s)

So those say that if the currently focused app's executable is 'gedit' then when I say 'paste that' it will press 'ctrl-v'. And if it's emacs it will press 'ctrl-x' then 'ctrl-s'. The executable would be determined by first getting the window's PID, then using that to get the commandline from e.g. /proc/<pid>/cmdline, then extracting the first argument from that as the executable.

We also have matchers other than app.exe, most importantly title for the active window's title. But you can see that app.exe might be more reliable than the title. For example my terminal will change the title depending on what program its running in it, but I might want to use my terminal's keyboard shortcuts with voice commands.

Another possible future reason is to facilitate authorization with a potential future RPC API. In this case the app might create a unix socket named after its PID for talking with Talon. When its window was focused Talon would want to know which socket to use to talk with it.

Also thanks for the interest here and also in your work on AccessKit and prototyping a next gen AT-SPI protocol for Linux :).

mwcampbell commented 11 months ago

and prototyping a next gen AT-SPI protocol for Linux :).

That's exactly why I'm asking about this; I want to make sure the new protocol provides what ATs like Talon need, while also not being coupled too tightly to implementation details that can change. And I think the PID of the application is just such an implementation detail. What if each window had an "application name" property instead? That raises the question of whether it's important for that property to be unforgeable, that is, whether it has to be set by a privileged OS component, or whether it can be set by the app itself. If I'm not mistaken, AT-SPI already has an app name property of the latter type.

splondike commented 11 months ago

Yeah, I think an application name with something useful would be enough for our needs. I could see any of the following going into it and doing a decent job: the flatpak id, the name or executable path from the .desktop file, just the executable path. A related concept is present in Talon under OSX, we have an 'app.bundle' matcher (e.g. com.apple.finder) which tends to be used preferentially to app.exe on that OS. I guess OSX apps are like Android ones in that they get defined names.

One potential issue that could make this discussion about a new AT-SPI tangential to Talon's needs is that a11y APIs are opt-in from the app's perspective. So if we relied only on AT-SPI to know things, we'd be missing some apps (e.g. old ones implemented on GTK-2, or random windows games run under wine). Under X Talon at least always knows the executable, and the window title of the focused app among other things.

One potential way around this opt-in issue is if we could somehow get the desktop environment to provide this minimal context to the AT-SPI system for all windows its managing. The apps themselves could then augment this minimal information with the actual contents of the windows if they chose to implement AT-SPI.

This idea might also speak to the question of the application name being unforgable since the desktop environment could set it. This feature wouldn't be essential for Talon I think, but wouldn't hurt. It might be more useful for screenreader users, I could imagine them asking for the name of the focused app to be spoken out before choosing to type in their password.

It also rings some bells about the QubesOS concept of putting different app into different trust zones and indicating this by their windows having unforgeable different coloured borders. This could be exposed to AT-SPI if we had trusted window properties.

lunixbochs commented 7 months ago

if we don't have something like a pid we don't have a good way to bind a unix socket rpc channel to the actual app managing a window