w3c / aria

Accessible Rich Internet Applications (WAI-ARIA)
https://w3c.github.io/aria/
Other
638 stars 123 forks source link

Need an attribute for direct interaction elements on a touch screen #1215

Open minorninth opened 4 years ago

minorninth commented 4 years ago

Screen readers with touch screen support typically include a "touch exploration" mode where you can tap or slowly drag around the screen and listen to feedback on what you're touching, before it activates. To actually activate, you double-tap.

There are a few cases where this is undesirable - like a virtual keyboard, or a signature pad. In those cases you want gestures to be passed through directly.

Some native accessibility APIs already have a way to specify this, like UIAccessibilityTraitAllowsDirectInteraction on iOS.

We should have a similar ARIA role or attribute for this.

I think at one point I suggested role="key" for a keyboard key, but I think the concept is more generic. Besides the signature pad, it could also be useful for a musical instrument, a game, or many other things.

It seems related to aria-interactive. The idea behind aria-interactive is that the control has its own keyboard support on a desktop computer. This idea is that the control has its own touch event support on a mobile touch device. They're quite similar!

So one idea would be: aria-interactive="touch keyboard mouse", etc. where you choose from various tokens.

Or, we could make it separate, like:

role="touch" role="directtouch" aria-touch="announce" vs aria-touch="activate" aria-touchactivate="true"

cookiecrook commented 4 years ago

IMO, this should not be a role... It would be on a container (with its own role) that may have one or more accessible elements inside the container.

One native example is the real-time instrumentation view of Garage Band on iOS. You can touch once with VoiceOver to explore the layout (e.g. spatial placement of piano keys or drums) then subsequent touches pass through directly to let you play the instrument in real time... Touching outside the container will reset the behavior, so that you need to select the container again in order to get the subsequent realtime touches passed through.

minorninth commented 4 years ago

Great example!

Can you provide any more technical details on how GarageBand implements this? Does GarageBand just toggle UIAccessibilityTraitAllowsDirectInteraction after you explore the instrument the first time? Or is UIAccessibilityTraitAllowsDirectInteraction always set and it changes what events it fires? Or is there some more advanced attribute that makes VoiceOver behave that way?

Any ideas for the attribute name?

It seems like great low-hanging fruit to implement since it'd be relatively straightforward to implement and ship on both iOS and Android without requiring any new native APIs.

cookiecrook commented 4 years ago

Can you provide any more technical details on how GarageBand implements this?

GarageBand just exposes the trait on the container (and leaves it), and VoiceOver does the rest.

cookiecrook commented 4 years ago

Here's a video demo that might work as an explainer. https://youtu.be/P056zcubhxQ

cookiecrook commented 4 years ago

Any ideas for the attribute name?

This interaction style is unlikely to be limited to touch (an eye-tracker pass-through for example), but I don't like any of the other names. aria-pointer? aria-manipulate? I'm hopeful a better name will arise.

aria-touch: undefined | direct | ... (... open ended for future expansion if needed)

We have a VoiceOver design principle of "safe exploration" so users don't accidentally trigger unwanted or unknown behavior. For example, I would still expect VoiceOver and other SRs to announce the element on first touch (e.g. hear "signature" the first time then touch again to sign). I wouldn't want authors to be able to bypass VoiceOver's "touch to explore" behavior without at least the initial selection confirmation.

We should also consider safety restrictions... For example, there's risk that a web dev could put this on the body element and therefore break the VO user's experience for the whole page. However, there might be some legitimate reason for doing that, if the application is entirely self voicing.

cookiecrook commented 4 years ago

aria-manipulation is growing on me.

Some of this may be complementary to the "activation point" discussion in #788.

carmacleod commented 4 years ago

Please also read through the (very up-in-the-air, but has some points) discussion about aria-interactive in https://github.com/w3c/aria/issues/746. If it is possible to merge the ideas into one concept, then maybe that would be the most universally useful?

What role would that piano keyboard have in a web app? (Heh, "application"? With roledescription="keyboard"?) :)

cookiecrook commented 4 years ago

What role would that piano keyboard have in a web app?

Probably a container role (main in the case of that specific UI) with individual buttons for each piano key.

minorninth commented 3 years ago

I'm not convinced that the overlap with aria-interactive is that high. None of the use cases in the aria-interactive bug would likely need direct touch support.

aria-manipulation is an interesting idea for a name, what would the possible values be?

I think I'm feeling more strongly that either "touch" should be in the name, or the value. This really is specific to touch.

cookiecrook commented 3 years ago

@minorninth wrote:

aria-manipulation is an interesting idea for a name, what would the possible values be?

Same as above? aria-manipulation: undefined | direct | ...

Open ended values for future expansion if needed.... For example, iOS VO's keyboard typing modes are somewhat like variants of direct touch.

I think I'm feeling more strongly that either "touch" should be in the name, or the value. This really is specific to touch.

I think we could live with aria-touch, but is it really specific to touch? Electronic document signatures as a use case came up again recently (e.g. DocuSign)… Would use of a stylus or a laptop trackpad count as "touch"?

cookiecrook commented 3 years ago

Or maybe aria-manipulate?

cookiecrook commented 3 years ago

@jnurthen @carmacleod this issue has a "NeedsExplainer" label on it. What should that cover that isn't explained in the description? If this thread covers it sufficiently already, I can draft a PR, or we can make it an agenda item on an upcoming weekly call.

jnurthen commented 3 years ago

@cookiecrook if you think there is enough to draft a PR then please go ahead. I think it would be handy to have a little more detail so AT know what they would need to do with such a feature - but that can certainly be added later.

minorninth commented 3 years ago

@cookiecrook for the case of DocuSign, can you think of a different behavior that you'd want to enable with any existing AT and a mode other than touch? For example, does VoiceOver on Mac have support for signing on the trackpad or anything like that?

My general inclination is that it's better to be specific and clear, and generalize later - unless we have a specific and clear idea for how it generalizes.

Right now "direct touch" interaction is the only thing I'm aware of that seems well-supported on multiple platforms with pretty clear semantics.

On Tue, Dec 8, 2020 at 4:44 PM James Nurthen notifications@github.com wrote:

@cookiecrook https://github.com/cookiecrook if you think there is enough to draft a PR then please go ahead. I think it would be handy to have a little more detail so AT know what they would need to do with such a feature - but that can certainly be added later.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/aria/issues/1215#issuecomment-741334770, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNIVBQHSGYUKTZJCEJWIL3ST3B5NANCNFSM4LOV6H7Q .

cookiecrook commented 3 years ago

@dlibby- mentioned considering this in the context of CSS touch-action https://developer.mozilla.org/en-US/docs/Web/CSS/touch-action

cookiecrook commented 3 years ago

@cookiecrook for the case of DocuSign, can you think of a different behavior that you'd want to enable with any existing AT and a mode other than touch? For example, does VoiceOver on Mac have support for signing on the trackpad or anything like that?

VO on Mac has Trackpad Commander, which still uses touch, but works a little differently in that it's not tied to a finite spatial layout like a touch screen... The trackpad coordinates are relative; mapped to the coordinates of the element in the VO cursor, without regard to aspect ratio.

cookiecrook commented 3 years ago

@minorninth wrote:

@cookiecrook for the case of DocuSign, can you think of a different behavior that you'd want to enable with any existing AT and a mode other than touch?

I thought of one more this morning... Switch Control on iOS has a freehand path feature that is somewhat deep in submenus by default, because its usage isn't common. Mainly used for drawing apps.

Surfacing the "direct touch" nature of an element would allow the AT to surface those lesser used AT features more conveniently. For example, the freehand (and multi-touch) options could be moved to a temp space in the main Switch Ccontrol menu, similar to how we surface Actions when available.

cookiecrook commented 3 years ago

I haven't considered fully, but there may be a case for multiple mix-and-match values, and an all value (equivalent to direct). I'm not certain how the implementations would differ though, or if this is necessary.

aria-manipulate: undefined | freehand | multitouch | … | all

ckundo commented 3 years ago

another use case I wanted to share, maybe an extension of drawing, is dragging and transforming objects in a 2D or 3D canvas context. ideally the author would have keyboard handling for these kinds of operations as well, but on touch or for low vision users using screen readers, it'd be helpful to have this feature.

fightliteracy commented 3 years ago

Having a distinction between only one finger and multi-finger requirements would probably be useful. If an area requires multi-fingers, other system wide gestures have to be ignored.

Would freehand correspond to "single finger" mode?

cookiecrook commented 3 years ago

Having a distinction between only one finger and multi-finger requirements would probably be useful. If an area requires multi-fingers, other system wide gestures have to be ignored.

Good point. Maybe single versus multi-touch is the only distinction that matters... I'm coming back around to Dominic's initial attribute name aria-touch... it's less likely to be misunderstood by web authors. Values might be undefined | single | multiple? multipoint?

Do we anticipate that any element implementing this should be self voicing? All the examples I can think of should "self voice" either through sound (e.g. the piano keys) or speech via an ARIA live region.

Also, draft should include a note reminding web authors to respect touchcancel events.

cookiecrook commented 3 years ago

@ckundo wrote:

dragging and transforming objects in a 2D or 3D canvas context.

If object based (with sub-DOM notes or TBD AOM virtual notes), that could be a use case for the Issue #762 aka user actions. But yes, canvas would have to be entirely self implemented with current API so a "direct touch" equivalent could assist in that.

fightliteracy commented 3 years ago

Aria-touchpassthrough would be clearer to be

Single / multiple then makes sense (to me)

I think most applications would require some form of aria-live feedback but I can also imagine something that just needed you to enter a signature might not need any self voicing /aria-live

cookiecrook commented 3 years ago

@minorninth @carmacleod @jnurthen What do you think about touchpassthrough? Verbose, but definitely the most author-understandable suggestion so far.

minorninth commented 3 years ago

I think the word "passthrough" is very clear, but aria-touchpassthrough="true" is quite long. How about aria-touch="passthrough" vs a default of aria-touch="explore" or "default"?

On Thu, Dec 10, 2020 at 6:03 PM James Craig notifications@github.com wrote:

@minorninth https://github.com/minorninth @carmacleod https://github.com/carmacleod @jnurthen https://github.com/jnurthen What do you think about touchpassthrough? Verbose, but definitely the most author-understandable suggestion so far.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/aria/issues/1215#issuecomment-742916374, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNIVBTB5IZEEMXYEBM2QITSUF4Y5ANCNFSM4LOV6H7Q .

cookiecrook commented 3 years ago

How about aria-touch="passthrough"

That might prevent the single vs multi distinction. Is that a dealbreaker?

minorninth commented 3 years ago

Is single vs multi passthrough possible in a native app on iOS or any other platform now? We shouldn't spec something we couldn't implement anytime soon.

Or why not allow for something like aria-touch="passthrough multi" in the future? I still think that's more readable.

But either way I'm willing to go along with any consensus. aria-touchpassthrough works.

On Thu, Dec 10, 2020, 8:59 PM James Craig notifications@github.com wrote:

How about aria-touch="passthrough"

That might prevent the single vs multi distinction. Is that a dealbreaker?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/w3c/aria/issues/1215#issuecomment-742969514, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNIVBSPKVX3GFKN5LN4OI3SUGRLXANCNFSM4LOV6H7Q .

mcking65 commented 3 years ago

Slightly off topic ... if we add this, is this the final straw that will lead to iOS and Android mappings in core-aam?`

patrickhlauke commented 3 years ago

just as a side note, i'd suggest staying away from explicitly calling this anything "touch" as that just brings pain later on when it then also applies to other input modalities (as it's now biting us in the back with Pointer Events where touch-action also applies to stylus/pen on touchscreen/digitizer interactions, so the spec will end up using the term touch-action and then spends a lot of time explaining that no, it's not just "touch")

minorninth commented 3 years ago

@patrickhlauke thanks, that's a good point. I think we want to be open to the possibility of supporting other modalities, while also recognizing that today, in practice, this only applies to touch so I think it'd be a mistake to overgeneralize too.

Previously I suggested aria-touch="passthrough". Maybe that was backwards, and instead we should propose:

aria-passthrough="touch"

The allowed value is a space-separated list of tokens, so that in the future we could potentially allow, e.g.

aria-passthrough="touch multitouch mouse stylus"

That addresses many of the previous concerns, and I think it's more readable than aria-touchpassthrough="true".

fightliteracy commented 3 years ago

Dominic, I think your proposal makes a lot of sense. It would be directly applicable and implementable.

Would there be a token that could account for all input passthroughs?

Sent from my iPhone

On Feb 19, 2021, at 9:49 AM, Dominic Mazzoni notifications@github.com wrote:

 @patrickhlauke thanks, that's a good point. I think we want to be open to the possibility of supporting other modalities, while also recognizing that today, in practice, this only applies to touch so I think it'd be a mistake to overgeneralize too.

Previously I suggested aria-touch="passthrough". Maybe that was backwards, and instead we should propose:

aria-passthrough="touch"

The allowed value is a space-separated list of tokens, so that in the future we could potentially allow, e.g.

aria-passthrough="touch multitouch mouse stylus"

That addresses many of the previous concerns, and I think it's more readable than aria-touchpassthrough="true".

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

minorninth commented 3 years ago

Would there be a token that could account for all input passthroughs?

You mean, like aria-passthrough="all"?

I guess my question is, are there any platforms where mouse clicks are not currently passed through, even when a screen reader is running? I thought that only touch inputs were captured by the screen reader. I'm not as sure about stylus.

I think we should leave open the possibility of "mouse" or "all", but not actually specify something that we couldn't implement in practice yet, or that wouldn't actually be helpful or useful in practice.

patrickhlauke commented 3 years ago

maybe describe more what the characteristic of the element is, rather than what the AT/UA should do? maybe something like aria-allowsdirectmanipulation="true" or something (i've been toying with calling this sort of thing "direct manipulation" over in the Pointer Events spec, FWIW...an imperfect name still, but a bit more generic)

minorninth commented 3 years ago

One worry I'd have about that is that someone could plausibly argue that a slider supports direct manipulation. But in practice, what users might actually want is a way to set the slider to a discrete value in a well-controlled way - which is not necessarily via direct manipulation. Adding this attribute to a slider could actually make it less accessible because it'd be incredibly difficult for some users to focus it without accidentally modifying it.

So we only want this to apply to something that requires direct manipulation, not just something that supports it.

It is somewhat related to aria-interactive. One problem we were trying to solve there is that modal Windows screen readers often intercept keys, but some controls need all keys passed through. role=application is an imperfect solution; what's really needed sometimes is for a screen reader to treat it similarly to a text box or list box, where most keys still go to that control while it's focused, and it automatically enters "focus" mode / forms mode when focused.

I still think they're similar but not the same thing, but I'm open.

If we wanted to combine them, we could say:

aria-interactive="touch mouse keyboard", which would mean that touch, mouse, and keyboard events should not be intercepted by AT if at all possible and should be passed through to this element when it would be the event target.

Another difference, though, is that "touch" could be implemented now by user agents because many platforms already support some way to achieve touch passthrough, whereas aria-interactive=keyboard would require screen readers to buy in and choose to implement it.

patrickhlauke commented 3 years ago

So we only want this to apply to something that requires direct manipulation, not just something that supports it.

yeah, it would be opt-in with the attribute. maybe dropping the s on allows and making it aria-allowdirectmanipulation ?

minorninth commented 3 years ago

To be more concise, how about:

aria-directinput

So possible values might be:

aria-directinput="touch"
aria-directinput="keyboard"
aria-directinput="touch mouse keyboard"
patrickhlauke commented 3 years ago

does it need to specify any input mechanism at all? (if so, there's also pen) as an aside, for windows-based AT, this starts to sound a lot like role="application"'s effect for keyboard control.

fightliteracy commented 3 years ago

I’m thinking if you want things like touch to go through but you’re not sure of all the variants, what do you put.

Will everyone be savvy enough to realize that stylus and touch are treated the same on iOS for example for VoiceOver.

On Feb 19, 2021, at 10:24 AM, Dominic Mazzoni notifications@github.com wrote:

 Would there be a token that could account for all input passthroughs?

You mean, like aria-passthrough="all"?

I guess my question is, are there any platforms where mouse clicks are not currently passed through, even when a screen reader is running? I thought that only touch inputs were captured by the screen reader. I'm not as sure about stylus.

I think we should leave open the possibility of "mouse" or "all", but not actually specify something that we couldn't implement in practice yet, or that wouldn't actually be helpful or useful in practice.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

fightliteracy commented 3 years ago

A difference is that this is usually used on a small region that’s not meant to impact the rest of the screen.

Sent from my iPhone

On Feb 19, 2021, at 4:30 PM, Patrick H. Lauke notifications@github.com wrote:

 does it need to specify any input mechanism at all? (if so, there's also pen) as an aside, for windows-based AT, this starts to sound a lot like role="application"'s effect for keyboard control.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

patrickhlauke commented 3 years ago

A difference is that this is usually used on a small region that’s not meant to impact the rest of the screen.

you'd likely want the same here as well, as otherwise you'd get in the way of users actually being able to confidently use touch-AT gestures?

pkra commented 2 years ago

Moving this issue to the 1.4 milestone (matching #1319)

cookiecrook commented 1 year ago

This stalled a bit, but in general I think aria-directinput and aria-passthrough are good names. Slight preference for "directinput" since my hunch is that it's less prone to typographical errors. To address the comment about how to include all the variants, the WG could consider a catch-all token like all or true

aria-directinput: [none] | all | touch | mouse | keyboard | …
cookiecrook commented 1 year ago

The more I look at this, the more I think it's unlikely web authors will get the modality-specific values right, especially a catch-all. Let's reconsider whether we need this now, or if we do, consider a more general value.

aria-directinput: [ undefined | direct ]

frastlin commented 1 year ago

Hello, We are currently needing to create an entire set of mobile apps just to add direct touch, which pretty much defeats the point of using the browser. I think the role="application" should require mobile screen readers to display the direct touch option, or when you tap on an area with a role=application, it should activate the direct touch with a screen reader specific way of exiting the application. I really think VO and Talkback should allow the user to enter direct touch or passthrough gestures whenever the user wants, but that's unrelated. I don't think aria-passthrough is needed above role="application". There's no reason for both to exist. I don't think there's a need for half of the gestures to work with a screen reader, and the other half not work. Just as long as it's easy to switch in and out of the direct input mode, it would be like the input mode that's activated on desktop. Using role="application" also doesn't require the API to change, and should be a pretty easy fix for browser developers to implement. In fact, the documentation on role="application" mentions touch, but iOS, as of today, doesn't provide direct touch as an option in the rotor when an element in the application container is focused. Here is the description from MDN: "The application document structure role, indicates to assistive technologies that this part of the web content contains elements that do not conform to any other known HTML element or WAI-ARIA widget. Any sort of special interpretation of HTML structures and widgets should be suspended, and control should be completely handed over to the browser and web application to handle mouse, keyboard, or touch interaction." It would be useful to reiterate that role="application" also includes touch on mobile devices.

cookiecrook commented 11 months ago

We are currently needing to create an entire set of mobile apps just to add direct touch, which pretty much defeats the point of using the browser.

@frastlin Can you explain more about this use case? I understand the technical need (and gave a music app as a user interface example above), but I think it would help if you could explain the user interface need you are abstractly referencing… potentially with pointers to the specific apps, if appropriate.

frastlin commented 11 months ago

We are making an inclusive digital map viewer that's accessible to blind users, and currently it works with a Bluetooth keyboard attached to an iPhone, but we can't activate direct touch with Voice Over (VO) on iOS to pass touch gestures to our ap: https://audiom.net We would like to embed this component into iOS apps through a web view, but if we can't activate direct touch, we're going to need to make a special native app for each mobile platform just to interact with touch screens. There are also browser games that have done different work-arounds to bypass the browser's lack of flexibility with touch gestures and VO: https://www.iamtalon.me/cyclepath/ They use the gyroscope and accelerometer instead of the touchscreen for the actual game play, which is fine, but is not what someone wants when they're viewing a map. We've done co-designs with blind users and they want to just be able to tap in a direction and move in that direction (which VO doesn't allow). Similarly, Google sheets and Google Docs are limited in their usefulness on touch devices for VO users because the VO experience for HTML tables and rich text areas is horrendous, and there's nothing developers can do to make the experience better through custom gestures.

Numerous applications that are built with native Swift or Objective C code on iOS have direct touch including: The Invisible Puzzle All The OBJECTIVE ED Games All the Blindfold Games MBraille Ariadne GPS etc. Let me know if you need more, these are just the apps I can think of that are on my phone.

frastlin commented 10 months ago

Hello, Note that using NVDA, there is also no touch passthrough by default. In order to get touch passthrough, the user needs to go into preferences / Settings / Touch Interaction / uncheck Enable touch interaction support. There is no NVDA touch command to enable or disable this interaction. The user also needs to go into touch settings on windows and uncheck 3 and 4 finger touch gestures. As a developer and user, I would assume all this would be done when I enter an application area where all keyboard input is being sent to the application. Why would I think touch gestures receive special treatment? I haven't tried using Jaws yet, but I would assume it would be equal to or worse than NVDA for this.

zphrs commented 6 months ago

Hi! I'm currently working on creating a universally accessible digital deck of cards using WebGL, the gyroscope, voice synthesis, and touch gestures. Overall after reading through this thread I agree with Dominic's proposal:

To be more concise, how about:

aria-directinput

So possible values might be:

aria-directinput="touch"
aria-directinput="keyboard"
aria-directinput="touch mouse keyboard"

I think that having a catch-all would lead to developers overstating their app's abilities, resulting in apps which claim support for "all" but might be missing support for certain input types. I also think that with the addition of new forms of interaction to the web (like gamepad controllers and apple's eye tracking), it would make sense for developers to manually opt into any interactions that the web developer supports, encouraging developers to think about whether their app actually properly supports that gesture type. To allow for a "catch-some" allowing something like aria-directinput="pointer", which would enable passthrough for all pointer events, could allow an opt-in for touch, mouse, and pen without the developer having to specify all three.

I also agree that one-finger gesture support is plenty to allow applications to do what they need to do. This would also mean that if a developer supports touch then they also essentially support mouse, pen, and any other single-pointer interface as well. One UI pattern that I could think of which would allow more gesture-based quick actions would be a press and drag to open a flower menu, where dragging outward from the initial touchpoint toward one of the 8 cardinal directions would select one of the options in the flower menu. A press and hold without a drag could activate the menu and read out the options as well as the corresponding direction which it is located at.

For my application for now I'm just going to recommend that people disable their accessibility tool before starting the app and ensure that I manually call dictation when needed. Ideally though I would use this new api to avoid having to tell users to disable their accessibility settings, even temporarily.

frastlin commented 6 months ago

"I also agree that one-finger gesture support is plenty to allow applications to do what they need to do." I'm a Voice Over user and can tell you this is absolutely not the case and should never be considered. The point of having this direct input functionality is to override the existing Voice Over gestures e.g., 2 finger double tap, 3 finger single tap, swipe to the right with 1 finger and hold, and 4 finger triple tap. It's really difficult for a Voice Over user to use a touchscreen application with only 1 finger. It would be like having a single fingertip sized hole you can see the app interface through on the screen. You would need to move that fingertip around the entire screen every time to find the next control. That would take hours.

The problem with telling a Voice Over user to disable their screen reader is that the browser has all kinds of junk (Bookmarks, tabs, the address bar) on the top and bottom of the screen, and the user will tap on these junk areas without meaning to which will take them out of the app.

Our current work-around is to create an app with a native WebView with direct touch enabled for the view, and enter a URL into the web view. This allows both direct touch and non-direct touch within the web view, but it's a bit janky, as in order to exit out of direct touch, the user needs to start activating the home screen by swiping up from the bottom, then before lifting their finger, they need to start using the rotor to deactivate direct touch. Voice Over doesn't have a way for users to easily stop direct touch. On the keyboard capslock is used as the universal screen reader key where the user can activate or deactivate direct keyboard input to games and other applications.

zphrs commented 6 months ago

"I also agree that one-finger gesture support is plenty to allow applications to do what they need to do." I'm a Voice Over user and can tell you this is absolutely not the case and should never be considered. The point of having this direct input functionality is to override the existing Voice Over gestures e.g., 2 finger double tap, 3 finger single tap, swipe to the right with 1 finger and hold, and 4 finger triple tap. It's really difficult for a Voice Over user to use a touchscreen application with only 1 finger. It would be like having a single fingertip sized hole you can see the app interface through on the screen. You would need to move that fingertip around the entire screen every time to find the next control. That would take hours.

To address this point I proposed a flower-like UI element where essentially the UI moves to wherever you touch. Maybe I didn't explain that point enough. To enable the flower you tap and hold anywhere and a voice will read out commands associated with gestures. For my card game for instance, it might read out: "Up - hear board, Down - hear hand, Left - flip card, Right - open settings. Drag your finger against the screen in one of the directions.". Then if you drag your finger in a direction - maybe up - it will say: "hear board - release finger to confirm". Additionally using other input such as gyroscope and keyboard inputs can further improve degrees of freedom in input, as you stated above.

I agree that ideally web developers would have full access to multi-touch but I worry that could disorient users who use assistive technology without very careful effort on the developer's side to explain how else to tab away from the element with direct input enabled (such as swiping from the bottom). One similar case is where code mirror gives the option to override the tab button's behavior for a code editor to allow the user of the editor to hit tab to indent. To ensure this isn't an accessibility nightmare, the developer must add text to the page which instructs the user how to tab away from the text input - typically by pressing the escape key immediately followed by pressing either tab or shift+tab.

The problem with telling a Voice Over user to disable their screen reader is that the browser has all kinds of junk (Bookmarks, tabs, the address bar) on the top and bottom of the screen, and the user will tap on these junk areas without meaning to which will take them out of the app.

That's a great point and it's why I made my app a Progressive Web App which means you can add it to your home screen to remove all of the browser UI elements from the interface. I can still see the annoyances with not having notifications read out while playing the game and with the swipe up from bottom gesture (to exit the app) being more sensitive. Like I said, ideally direct input will get merged into the browser and I would be able to use this API.

Our current work-around is to create an app with a native WebView with direct touch enabled for the view, and enter a URL into the web view. This allows both direct touch and non-direct touch within the web view, but it's a bit janky, as in order to exit out of direct touch, the user needs to start activating the home screen by swiping up from the bottom, then before lifting their finger, they need to start using the rotor to deactivate direct touch. Voice Over doesn't have a way for users to easily stop direct touch. On the keyboard capslock is used as the universal screen reader key where the user can activate or deactivate direct keyboard input to games and other applications.

It's awesome that you figured out a solution which works for your use case by making a native app to wrap a web view. Unfortunately, I have ideological and logistical reasons as to why that solution doesn't work for me. I love developing on the web because whatever I make is intensely accessible. There is no app install process necessary (aside from optionally adding it to your home screen) and there is no third party store which imposes certain rules, restrictions, and approval processes. All I need to do as a developer is upload my code to a CDN and now my app is accessible to anyone who is able to visit my website. Separately the restrictions that the web browser imposes on apps with sandboxing means that users, including me, generally trust visiting a website way more than downloading an app.

I think that the cautious flexibility of the web is great and I think that if you need more control than what the browser gives you then you should create a dedicated app. I think that breaking built in VoiceOver multi-touch commands is not worth having multi-touch available to web developers. I totally understand that a map app usually has some form of multi-touch for zooming and panning, but I would argue that a map app should also probably support single-touch gestures, just in case you don't have two fingers free while navigating the world. Apple Maps for instance uses a tap quickly followed by a drag to support zooming in and out with one finger. I have been trying to think of places where multi-touch is absolutely a must-have and I've been generally drawing a blank. If you have an example of a feature that would require custom multi-touch for intuitive interaction in your map app please by all means do share, I would love to hear about it.