Several core architectural features of the Web Platform may allow heuristic detectability of assistive technology

cookiecrook commented 3 years ago

Issue

Several core architectural features of the Web Platform (in HTML, CSS, client-side JavaScript, etc) may be abused to allow heuristic detectability of assistive technology, conflicting with the Web Platform Design Principles.

Background

A recent cross-functional privacy review raised ARIA #1371 against aria-hidden, citing that it may be abused to infer whether the user is running some form of AT. However, the core API used in the proof-of-concept demo is not aria-hidden, but JavaScript event handling. The inclusion of aria-hidden just makes it a little more difficult for an AT user to know something is amiss. See additional discussion in ARIA #1371.

The Privacy group asked ARIA WG to add prose in ARIA 1.3 noting the issue, and we've agreed to do so. However, this is a pervasive issue that affects HTML, CSS, client-side JavaScript and other core technologies of the Web Platform. I'd estimate there are at least dozens, if not hundreds, of heuristic ways to attempt AT detection, meaning many W3C Web technologies conflict with Web Platform Design Principle 2.7 Don’t reveal that assistive technologies are being used.

Many of the detection methods are fallible, but using a combination can result in a reasonable degree of certainty. Some methods may be resolved with bug fixes in rendering engines or AT. Others are more difficult. For example:

Example Methods Addressable by Web Engines or AT

Event objects are notably different: some ATs consistently send x/y coordinates as 0,0 on a click event. Others aren't as rigid, but still send predictable coordinates. Introducing artificial x/y coordinate variance could negate this method.
Event timings are different between AT and mainstream users. Sometimes AT-triggered event sequences (e.g. mousedown/mouseup) occur more quickly than is likely with a physical mouse or trackpad. Introducing artificial runloop delays could resolve this.
Many more…

Example Methods Not as Easily Addressable

The simplest method I'm aware of is: <a href="/confirm_at" tabindex="-1" style="position: absolute; left: -9999px;">only AT users or bots will click this link</a> Attempting to "resolve" this one may cause more harm than good.
AT usually simulates mouseup/mousedown/click sequences, but on desktop platforms, they don't simulate the mouse movement (mousemove) that would naturally occur between elements. Doing so to "resolve" this difference would likely have unintended consequences.
Many more…

Potential Resolutions

Members of the Privacy Interest Group have suggested that these should be called out in every spec. For example, CSS position (and other properties) would list that the property could be abused and therefore out of alignment with the the AT detection design principle. Likewise, many JavaScript events and event object properties would list something similar. There was a suggestion to include a watermark in BikeShed and ReSpec, similar to the BikeShed watermark for fingerprinting.

For the bugs that are addressable in browsers or AT, I think it's reasonable to call them out in the open. It will likely incentivize resolution, and spark additional research into related areas.

For the other issues that are not as easily addressable, I and others in the ARIA WG worry that, without a clear path forward, listing these will only serve to stoke fear in users, and/or act as a recipe for malicious actors, allowing them to create new or further refine existing detection methods. The counterpoint is that the open discussion will result in more heads considering how to address the problems. I'm not aware of precedent for a private list of accessibility issues, but the TAG could consider a model similar to security bug lists, or decide to address these in the open.

Given the scope is much broader than anything in the ARIA spec, the ARIA Working Group has asked me to file this issue for the TAG's consideration. We look forward to the discussion and eventual resolution. Thanks for your time.

cookiecrook commented 3 years ago

@shivankaul @jyasskin @marcoscaceres @Alice @LJWatson @hober @jnurthen @jcsteh @jaws-test @carmacleod @pes10k @cynthia @aleventhal

alice commented 3 years ago

I think there are two closely related issues around detecting AT (which we've discussed at length previously):

being able to detect with some degree of certainty that a user is likely using AT
APIs which allow unambiguous AT detection (i.e. accessibleclick etc.)

The risk with the latter (2) is that it looks like it's actively encouraging AT detection (as shown by emails I received asking when the accessibleclick API would be available so that a product could track which users were using AT, so that they could attempt to measure UX quality for AT users). This means that things which should be a "code smell", such as creating separate interfaces for AT users, may be interpreted as good practice. This is the type of case that "Don’t reveal that assistive technologies are being used" was primarily intended for, I think.

Heuristic detection is also a real risk to users' privacy and user experience, and it should definitely be addressed where possible, but I think it's a distinct issue.

content-visibility is one example of a new spec which includes language explicitly intended to capture this risk and advise about mitigations. However, what happened in that case was that that language was over-interpreted such that some content was not exposed to users when they would usually expect it to be.

I think that's a useful case to think through, for the reasons you mention around unintended consequences. With heuristic detection there is often not a clear "ideal" mitigation which doesn't have those unintended consequences.

I almost think it might be worth having a second principle to consider how any user-facing feature (i.e. most things in HTML or CSS) should be exposed to assistive technology APIs, weighing up the user experience and any privacy risk. (And that's before we get into questions of whether a given UX pattern is inaccessible by design, but that's a different conversation.)

In terms of how to address the existing issues like the ones you list, I like the parallel with security disclosures. I'm not sure whether TAG would be the best place for those to live, though - the audience would be spec authors and browser vendors, right?

wseltzer commented 3 years ago

PING is interested in participating in this conversation on identifying and mitigating AT-fingerprinting/detectability in the platform. /cc @samuelweiler @sandandsnow

cookiecrook commented 3 years ago

@ShivanKaul, since you're filing these issues against the lower-risk accessibility specs, do you want to file the CSS issue too? I filed HTML #6533 in the WHATWG tracker.

cynthia commented 3 years ago

@wseltzer Is PING and the relevant a11y folks interested in discussing this in a call in the near future? The meta-issue of "AT is already detectable, what should we do about this" is what we are interested in.

I'm taking an action to address case (2) in @alice's comment and https://github.com/w3c/aria/issues/1371 in the principles document.

pes10k commented 3 years ago

Hi @cynthia , at least speaking for myself and not anyone else from PING at the moment, I'd be happy to discuss on a call. We had some concrete ideas over the last couple of calls that i'll try to list below, but would be happy to discuss more on a all (i'm not advocating for any of these at the moment, but just trying to remember the full list):

Call out AT detection issues in specs the same way fingerprinting issues are (imperfectly) called out in specs, with special markup
Encourage HR groups to push harder to have AT detection issues addressed in specs at their next transitions
Build up a list of known issues so that activists / researchers / vendors etc, who might use such a list to address those issues in tools
The above, but make the list secret / not public (to reduce the likelihood of misuse)

There may be others i'm not remembering too

cookiecrook commented 2 years ago

@cynthia @pes10k Is this worthy of discussion at TPAC 2022? IIRC, I saw Privacy IG and Privacy CG meetings on Monday Sept 12th.

cynthia commented 2 years ago

Yes, this is definitely a TPAC worthy discussion.

LJWatson commented 2 years ago

Would be interested in a conversation at TPAC.

dbaron commented 2 years ago

Another factor that's worth considering here is not just whether features of the platform allow detecting assistive technology (AT) but whether performance characteristics also allow that detection. My memory from a number of years ago is that it was quite common for some Web APIs to have substantially different performance characteristics when AT was in use, because of the overhead of keeping a separate accessibility tree up-to-date. This would make the speed of some operations (those affected by this overhead) much slower when compared to others (those not affected or less affected). I think these performance differences may not be as dramatic as they were a few years ago (although there are also likely cases where they've gotten worse as a result of architectural changes in browsers related to process separation), but I think it's pretty likely that they're still present in a number of cases.

I think a credible plan for avoiding detectability of AT should explain how it would address detectability through performance characteristics.

cookiecrook commented 2 years ago

a credible plan for avoiding detectability of AT should explain how it would address detectability through performance characteristics.

That's one of probably dozens of ways, but if our goal is avoiding detectability entirely, we could be setting ourselves up for failure. I don't want an assumption that it can never be 100% effective to turn into justification to give up the effort entirely.

I'd focus on what progress can be made toward normalizing the heuristic differences between AT usage and mainstream usage, rather than the perhaps unreasonable goal of 100% undetectable. Or, perhaps to normalize the difference between different types of AT... IOW, there may always be a way to detect that some AT is active, but the web developer should not be able to make disability inferences such as, User A is quadriplegic, and User B is using a voice utility out of convenience.

cookiecrook commented 2 years ago

Is this on the list for TPAC discussion? I'll be in Vancouver the whole week. I could also propose it as a breakout session if that's better.

cookiecrook commented 1 year ago

I'll be at TPAC again this year if this is on the table for discussion.

cookiecrook commented 1 month ago

@matatk FYI

w3ctag / design-principles