w3c / at-driver

AT Driver defines a protocol for introspection and remote control of assistive technology software, using a bidirectional communication channel.
https://w3c.github.io/at-driver
Other
31 stars 4 forks source link

Consider using the VoiceOver AppleScript bridge instead of disabling System Integrity Protection (SIP) #74

Open cookiecrook opened 8 months ago

cookiecrook commented 8 months ago

I have raised concerns a number of times (some of this captured in #11 and #12) about standardizing AT control from WebDriver in the form of the open key combo API proposal that is being proposed. This Key Press API requirement seems like a deal-breaker for security reasons, since the AT Driver group's current approach for Mac requires disabling System Integrity Protection (SIP).

That said, I do think that automating VoiceOver with different browsers on Mac is achievable with the existing AppleScript automation bridge in VoiceOver, and osascript.

See the full VoiceOver AppleScript interface in:

Example functionality includes:

In particular, you may be interested in the functionality of perform command, which takes the string name of any VoiceOver command and executes it.

perform commandv

Likewise open, which gets you to a number of places Matt King wanted to verify.

open v

I believe this interface should give you as much functionality as the keyboard hotkey simulation your current AT Driver plans expect. Since you’re still planning to write a per-screenreader hotkey translation layer (command x is “a” on NVDA and “b” on VoiceOver), it seems reasonable that the translation layer could instead use the AppleScript bridge for any VO command sent via osascript, rather than the current proposal which requires disabling system integrity protection (SIP) and simulating KB HID events.

If for some reason, you find functionality that is not provided by VoiceOver’s AppleScript bridge, that may be addressable in a bug report or enhancement request.

As a tangent, if there was some functionality you need that is not provided by that bridge, there are potentially other ways to get it. For example, if you wanted more granular control and introspection into the speech synthesizer, you could embed new functionality in your own custom TTS voice. Which may help with the AT Driver testing, effectively inserting your script directly into VO’s speech output stream.

Sample code to develop a custom speech synthesizer, which for example could be used by VoiceOver, and log all speech jobs, or trigger other notifications when speech jobs were sent… https://developer.apple.com/documentation/avfaudio/audio_engine/audio_units/creating_a_custom_speech_synthesizer?changes=_5&language=objc

I have mentioned the AppleScript and custom TTS voice paths before in AT Driver meetings and emails, but in less detail… I'm hopeful this GitHub issue can be a more permanent record.

lolaodelola commented 8 months ago

Thanks for this @cookiecrook.

cc @mcking65 @jugglinmike

jlp-craigmorten commented 8 months ago

Appreciate I'm jumping into an issue without context, but hopefully some useful points:

In particular, you may be interested in the functionality of perform command, which takes the string name of any VoiceOver command and executes it.

To add some colour to the suggestion, I have had reasonable success wrapping around this interface for VoiceOver automation. See https://github.com/guidepup/guidepup/blob/main/src/macOS/VoiceOver/performCommand.ts#L13 for one such manifestation of the suggestion. Once you have enumerated all VO commander commands, this allows you to make use of almost all VO functionality with relative ease.

This Key Press API requirement seems like a deal-breaker for security reasons, since the AT Driver group's current approach for Mac requires disabling System Integrity Protection (SIP).

One thing to note is that to drive VoiceOver with AppleScript requires SIP to be disabled at some point because the file that controls whether you can drive VO with AppleScript is SIP protected. That said once the appropriate file entries are set, SIP need not remain disabled for continued use.

See https://github.com/actions/runner-images/issues/4770 for solution applied to GitHub actions runners.

cookiecrook commented 7 months ago

@jlp-craigmorten wrote:

One thing to note is that to drive VoiceOver with AppleScript requires SIP to be disabled at some point because the file that controls whether you can drive VO with AppleScript is SIP protected.

You do not need to disable SIP to automate VO via AppleScript... You can enable it from VoiceOver Utility > General > Allow VoiceOver to be controlled with AppleScript.

That's a hardened checkbox (magic!) in the UI that can't be scripted with untrusted events, but you can manually enable it on your automation machines via the mouse, keyboard, or AT... Disabling SIP is a much less secure shortcut to bypass that security hardening, and it's not recommended.

jscholes commented 7 months ago

@cookiecrook

That's a hardened checkbox (magic!) in the UI that can't be scripted with untrusted events, but you can manually enable it on your automation machines via the mouse, keyboard, or AT...

Are you indicating that the checkbox cannot be programmatically toggled at all, but only with human intervention? If so, that will be untenable for many CI/CD-like setups where machines are spun up on demand, to be driven by other machines. I would guess that was the reason for @jlp-craigmorten's suggestion.

cookiecrook commented 7 months ago

I’m not particularly concerned if CI administrators choose to willingly disable SIP on an automation cluster. That’s a risk any knowledgeable system administrator can assess on a case-by-case basis.

However, I’d be more concerned if setup instructions for a development environment (often run on people’s personal devices) instruct the runner to disable SIP on their primary machine. I was pointing out a factual inaccuracy in Craig’s statement that “disabling SIP is required” (It’s not, and most people shouldn’t.) I hope that clarification can be acknowledged in this project’s setup/docs and in Craig’s guidepup project, if the SIP vulnerability is recommended there.

jlp-craigmorten commented 7 months ago

To clarify the "disabling SIP is required" like statement - this was only intended in the context of CI like scenarios where it only needs to be disabled for the duration of setting an entry in said file (and then can be re-enabled immediately afterwards for the agent/runner/etc.) if a scripted approach to enabling AppleScript for VO is taken. There is also the alternative to use UI scripting to check the aforementioned checkbox, though this (likely) requires the UI script to be passed credentials which is unlikely to be fit for purpose in CI envs (e.g. GitHub actions where the user doesn't own the agent) and has it's own security implications. There might be other options, but none that I'm aware of.

For manual scenarios, e.g. local development in the context of this project, I would strongly suggest users don't disable SIP (and as an aside, hopefully my project/documentation doesn't suggest otherwise - please shout if feel I can improve anything in particular!).

Given I presume this project plans to extend beyond the realm of local/manual setup, my comment was just to flag that SIP still needs some consideration with any approach involving AppleScript for VoiceOver (though perhaps to a lesser extent than the Key Press API being discussed) assuming that not all scenarios/use-cases allow for the manual configuration of the automation machine. If this is the direction taken, documentation is likely the answer where configuration options are discussed with their associated security considerations clearly defined.

I'll try to clarify my statements better in future 😅 (and apologies for causing a distraction away from the main point of this issue).

cookiecrook commented 3 months ago

Also forgot to mention @ckundo's AccessLint project: https://github.com/AccessLint/screenreaders

jugglinmike commented 2 weeks ago

I believe this interface should give you as much functionality as the keyboard hotkey simulation your current AT Driver plans expect. Since you’re still planning to write a per-screenreader hotkey translation layer (command x is “a” on NVDA and “b” on VoiceOver), it seems reasonable that the translation layer could instead use the AppleScript bridge for any VO command sent via osascript, rather than the current proposal which requires disabling system integrity protection (SIP) and simulating KB HID events.

The hotkey translation layer exists in the ARIA-AT project (e.g. for the "Alert" tests), not the AT Driver draft community group report. That division of responsibility significantly reduces the normative requirements of the proposed standard, allowing it to "simply" specify the conditions under which any key should be pressed and allowing it to remain agnostic to the meaning of the keys themselves.

This of course does not prevent AT Driver from accommodating Apple's security restrictions, but I hope it helps explain the challenge involved. We are considering whether we can address this with some combination of protocol-level "capabilities", a new "user intent" command, and supplemental informative documents which describe implementation-specific behavior.

As a tangent, if there was some functionality you need that is not provided by that bridge, there are potentially other ways to get it. For example, if you wanted more granular control and introspection into the speech synthesizer, you could embed new functionality in your own custom TTS voice. Which may help with the AT Driver testing, effectively inserting your script directly into VO’s speech output stream.

Sample code to develop a custom speech synthesizer, which for example could be used by VoiceOver, and log all speech jobs, or trigger other notifications when speech jobs were sent… https://developer.apple.com/documentation/avfaudio/audio_engine/audio_units/creating_a_custom_speech_synthesizer?changes=_5&language=objc

Yup, we ruled out polling the "last phrase" API in 2022, and as noted, that's motivated development of a TTS voice--first using the "Speech Synthesis Manager" and more recently using the new API you've referenced. Getting the latter running has been an adventure!