Define how to request features which require user consent

immersive-web / webxr

Repository for the WebXR Device API Specification.

https://immersive-web.github.io/webxr/

Other

2.98k stars 382 forks source link

Define how to request features which require user consent #424

Closed toji closed 5 years ago

toji commented 5 years ago

We've identified that there will be multiple features that WebXR will want to expose over time that will require some form of user consent. This subject has gone back and forth a LOT, and I'm not going to try to capture all of the discussion again here, but instead present my current thinking and iterate from there.

We want the request for these features to have certain properties (Partially pulled from #330):

We want to avoid a scenario where some browser are forced into presenting the user with a sequential series of dialogs for common scenarios. And example of this "forced" behavior is if the creation of a session requires a permission prompt, and then calling a method on the session also requires a permission prompt. The UA can't do anything to make the presentation less jarring because the session has to fully resolve before the next permission is even requested.
Ideally there's a way to allow the browser to bundle all the necessary permissions into a single dialog if desired/applicable.
The system should also allow for bespoke feature requests at the time they're actually necessary. (ie: Not requesting access to RGB camera data until the user clicks a "take a photo" button.)

The most straightforward way I see for addressing that is to simply pass the requested features into the session request call, either as dictionary keys or possibly and array of strings. For the moment I'm going to suggest the dictionary key route:

navigator.xr.requestSession({
  mode: 'immersive-ar',
  lightingEstimation: true,
  environmentMeshing: true
}).then(/* ... */);

And for bespoke feature requests afterwards have an explicit function that takes in the same dictionary args (minus the session mode):

xrSession.requestFeatures({
  lightingEstimation: true,
  environmentMeshing: true
}).then(/* ... */);

The UA should not reject the session if any of these features are either not supported or the user does not consent to giving access to them. Instead the session should still be created like normally and the mechanism for accessing the feature once enabled should be null/reject/report an error/whatever failure mode is appropriate. No differentiation should be made between the feature not being supported and the user denying consent to avoid leaking information about the users system without their permission. That way we're encouraging developers to be responsive to varying feature sets. If a feature is deemed to be absolutely critical for developers to identify prior to starting a session it should be handled by the mechanism discussed in #423.

To that end, it's natural to draw some parallels between this and #423, and as such it feels like they should both use the same mechanism for identifying features (and maybe even share enums) but at the moment I'm going to recommend against that. For one, use of dictionary keys makes it possible to make the permission requests more expressive. For example: lightingEstimation: 'ambient' instead of a simple boolean. Also, since the request model should allow session creation to still succeed even when there feature isn't supported, dictionaries provide the right implicit guarantee. Keys that aren't understood will be safely ignored. If we used an array of enums then passing an unrecognized enum would cause the call to fail. (Desirable for requirements, not here.) We could get around that by simply stating that an array of strings is passed instead, but I've already gotten feedback that's seen as a bit weird in terms of web platform ergonomics.

blairmacintyre commented 5 years ago

I like this. One question: if a feature isn't available (either not supported, or because the user declines permission), how is this exposed.

When thinking about geoAlignment, I imagined requesting it (in this case, adding geoAlignment: true into your list of features) and then having a way to query the resulting session. One way might be to add a features property to the resulting session that contains the same set of properties as was requested (e.g., mode, lightingEstimation, environmentMeshing in your requestSession example above) with the corresponding values for the session.

This means for example, an app knows immediately if a browser supports meshing, versus not just having received a mesh yet. In the former case, it can do what it needs to do to work without meshing. In the later case, it might display a floating hint asking the user to "look around" (or whatever).

aside: can we add "geoAlignment: true," to these examples, since it's a pretty good candidate for this approach. There are a lot of platforms that might not support this, or ones that might support it intermittently. In particular, on a device like Hololens on ML1, with no compass, supporting geoAlignment of the coordinate system may be impossible to do transparently at the framework level since there will be no sensor for heading. But, one could imagine the platforms extending their mapping capabilities to allow some sort of offline (manual or automatic) alignment of maps with geospatial coordinates in the future; in such a case, geoAlignment might return true ("it's possible"), but a separate geoAlignment property on the session might be true or false depending on if the coordinate frames are currently aligned. I can imagine other features might have a similar need to differentiate between "feature available and allowed" and "feature currently active", beyond meshing and alignment.

cwilso commented 5 years ago

One more question - if one or more features require user consent, I'm presuming the resolution of the session creation will async wait until the user consent resolves or rejects. Are we concerned that this might expose (through timing) the difference between not supported and not allowed?

toji commented 5 years ago

The presumption about the promise not resolving until consent is given is correct. Good point about the timing "attack", though I'm not sure how big of a concern it is?

Talking it through: Let's say the goal of a malicious app is to sniff out the presence of various hardware features even if the user doesn't want to expose them. We've already established one safety mechanism in that the only time at which they can do it is during immersive session creation, which must happen on user activation and has extremely visible side effects, so spamming the test isn't practical. Also, since any errors won't tell you which features failed you can't reasonably sniff out more than one at a time, and that's only if your "bait" experience doesn't use anything but the core API. Additionally, on some browsers you may still get a consent prompt even when no features are requested. (We've talked about always having some basic consent process for immersive-ar on mobile Chrome, for instance.) Even in absence of that there's a possibility that cached or UA-level settings could kick in and change the timing behavior, but that's a theoretical mitigation.

SO, that means that you can potentially use a timing attack to infer feature presence if and only if:

You are on a browser that doesn't always gather user consent.
You ask for only one feature
You actually have some sort of real experience to back it up, in case the session request succeeds.
You know the UA isn't intervening in some other way.

At which point you can gain a fuzzy idea of a single bit of data. Which seems like... a lot? I'm having a hard time imagining a feature for which knowledge of it's presence (without ability to use said feature) is so valuable that someone would actually go through the effort? { requiredFeatures: 'owned-by-high-level-government-official' } 😉

klausw commented 5 years ago

I agree with @blairmacintyre that there needs to be a way for the app to know which of the requested features are available, especially in cases such as environment meshes or even simple hit tests where actual data may be initially unavailable.

On the implementation side, I'm a bit concerned about xrSession.requestFeatures adding features at runtime. Would there be a way for an implementation to say that it could support this feature, but only if it's requested at session creation time? In some cases I suspect it may not be reasonably possible to switch configurations inside an already active session. While the user agent could simply respond to every runtime request with "not supported", this isn't very helpful to the user (or application) if there's no way to communicate this.

About "Not requesting access to RGB camera data until the user clicks a "take a photo" button", I think the app could simply ask for"RGB camera data" feature at session creation time, but the user could still have multiple ways to get a similar result:

User trusts the app (maybe due to having used it before) and says yes at session creation.
User declines the feature. If the user realizes they do like the app and want to take a photo, they could exit the session and re-enter it, enabling the feature this time.
User tells the user agent to pretend to have the feature, but supplying synthetic camera data or severely downgraded real camera data instead. The user agent could switch to actual full quality camera data at runtime at the user's request. This would be transparent to the application from the API side. (The app could potentially detect this change by analyzing received camera images, but it wouldn't be involved in requesting the runtime change.)

Bigger picture, how important is it to support this runtime feature selection? Would it make sense to just start out with everything being requested at session creation time for the initial launch, and tackling dynamic features in a followup? If we can make exiting/re-entering sessions pretty quick and seamless, would that be good enough? Note that exiting/re-entering wouldn't necessarily involve taking off a VR headset, i.e. Daydream's VR browser makes this pretty seamless.

@toji wrote:

At which point you can gain a fuzzy idea of a single bit of data. Which seems like... a lot?

I'm not sure how to parse that - did you mean "seems like a lot of effort to get a single bit of data"?

blairmacintyre commented 5 years ago

To be clear, I was trying to say that the app should know if the session supports a feature, not that the browser/device supports a feature. I do not think that the session should indicate what features the device is capable of.

So, a session would say geoorientation = false if the session doesn't support it; this probably means the browser doesn't support it, since it's not something I think we need permissions for.

But, a session might say facedetection = false if the session doesn't allow it, perhaps because the user denied it, perhaps because the session doesn't support it.

klausw commented 5 years ago

I fully agree that this should all be based on per session support - if the user (or user agent on behalf of the user) decide to keep a feature disabled for a session, this should look the same from the app's perspective as if the device inherently couldn't support the feature due to hardware limitations. Also, an app should be able to deal with a capability change when the user stops and restarts a session. This may be due to the user examining an application first before deciding on granting advanced environment access for a second session.

Conversely, I'd also be open to the idea that a user agent could claim support for features that are actually emulated, i.e. 6DoF movement via thumbstick locomotion for users who don't have enough space for roomscale or have mobility restrictions. There may be cases where this would be undesirable for an application, for example in a competitive multiplayer game, but arguably this kind of problem already exists on the web platform where the user could be running a modified browser.

johnpallett commented 5 years ago

It's not clear whether bespoke feature requests (after session creation) can be supported across all platforms in a way that gives developers a predictable cross-platform user experience.

For example, different platforms may give the user agent different options for how to present a trusted interface, but those differences can result in significantly different user experiences:

On an HMD with a handheld input device, a user agent might reserve certain inputs (e.g. a home button) as an affordance for the user to confirm that a user interface can be trusted;
On a standalone HMD without a handheld input device, a user agent might require that the immersive session be suspended in order to present a trusted interface;
On a desktop or mobile device without an HMD, similar to HTML5 fullscreen the user agent may have no alternative but to suspend/resume the session in order to present a trusted interface (one analysis for HTML5 fullscreen is here);
On a tethered HMD, a user agent might preserve the session, but may or may not require the user to remove the headset in order to present a trusted interface on the desktop;
On other form factors such as CAVE systems there may be additional input constraints or requirements for the persistence of the session that limit the ability for the presentation of a trusted interface (or the ability to gain consent) in other ways. In the extreme, such systems may not physically be capable of supporting bespoke feature requests.

This divergence could lead to significant challenges for cross-platform development because developers may not be able to predict how user consent will impact the user experience. For example, a developer who builds an experience on an HMD with an input device (and tunes the experience for those consent affordances) may not realize that the same experience on a tethered device might require the user to remove the headset for every instance of user consent. In extreme cases, a developer may require a bespoke feature that the user physically cannot consent to on certain platforms, rendering those platforms inoperable for those parts of the experience in ways that the developer cannot predict.

More generally, because different platforms may introduce different levels of disruption or consent capability, it seems that the only pattern that a developer could safely use for cross-platform development is to request the desired features at time of session creation. Doing otherwise would give an unpredictable cross-platform experience with varying levels of discomfort for the user that the developer may not be able to predict.

Further, bespoke feature requests lend to other concerns including:

Fatigue from over-prompting during the WebXR session. Among other concerns, sites can exploit fatigue or confusion to trick users into giving long-running permissions (e.g. camera, location) by interleaving them with WebXR consents;
The tension between comfort and expressiveness will likely vary between platforms, making it impossible for some platforms to ensure that the user is actually giving informed consent without disrupting the experience.

@avadacatavra curious to get your thoughts here.

blairmacintyre commented 5 years ago

FWIW, in my current implementation of some of this (to try it out) on the new version of the WebXR Viewer, I've settled on the following:

requestSession includes requests for worldKnowledge, camera (which includes worldKnowledge implicitly), alignEUS (to align with the world). worldKnowledge includes native sensing like faces and images, as well as illumination estimation.
these result in one of 3 permissions: basic (just motion plus anchors and hittesting), worldKnowledge (same, plus geometry, illumination and built in sensing) and camera. alignEUS works with all.
the permission dialog presents the option of a "Lite" mode for the first two, where the user selects a single plane. The app doesn't know that it's only getting one plane, but that's all it gets.

Part of the goal is the allow all the perms to be dealt with at once. There are various ways this could be sliced and presented, but we're currently testing out dialogs that look like this: webxr-perms-layout

We also show the current state in the URL bar (when it's visible), and tapping on an icon allows you to change (downgrade) the permissions; the change is not passed to the page, it simply sees (less) data until the user changes it again. This allows for stopping/restarting the flow of data (say, if you're in a sensitive area, or someone wants you to stop using the camera).

webxr-perms-layout-icon

(I have a blog post pending, which is why I have these side-by-side images 👍 )

toji commented 5 years ago

Wanted to leave a quick comment thanking John for reviving this important issue with well researched recommendations. I'd also like to point out that one of his principal conclusions, that we should not allow for bespoke feature requests, directly contradicts one of my stated goals earlier in the thread. Despite that, after John graciously took the time to talk through the reasoning for that recommendation with me, I find that I agree with his conclusion, especially as we consider how varied the immersive hardware ecosystem is.

I anticipate that, like me, other WG members will have concerns about front-loading consent for all of the features used by a session, and I'd encourage you to voice them. We want to make sure we're doing what we can to provide a reasonable support path for as many compelling use cases as possible while keeping users safe and informed, and we can only get there of we're considering a diverse range of UAs, devices, and experiences.

avadacatavra commented 5 years ago

@johnpallett You make a good point about the different ways various devices present a trusted interface. Given that, I agree that it's sensible to prompt for consent on session creation. I really like Blair's approach to presenting the permissions at one time, but separated, and allowing for easy toggle.

One question I have---how would this work with immersive navigation? For example, I start an immersive session with foo.com, then navigate (while still in immersive mode) to bar.com. Would this trigger a new webxr session (thus causing the discomfort due to different trusted interfaces) or would this transfer the foo.com permissions to bar.com?

johnpallett commented 5 years ago

@avadacatavra if the user navigates to bar.com and the immersive session will share data that requires user consent, then I believe there'd be the same discomfort due to different trusted interfaces (regardless of whether or not there is a new WebXR session).

It may be that cross-origin navigation would suffer from the same discomfort irregardless of consent, though. Even if user consent was not required (for example, if bar.com had already received consent in the current browsing context), the user would presumably need some indication of what origin they were visiting from a trusted interface, otherwise any origin could pretend to be TrustedSite.com (even if the origin is actually BadSite.com) and solicit sensitive information.

This is discussed to some degree in the navigation explainer and the navigation repo, but I don't believe there is a proposal for how to address discomfort and different types of trusted interfaces during cross-origin navigation. I've added navigation/#5 to make sure this question is captured.

probot-label[bot] commented 5 years ago

This issue is fixed by PR #739

NellWaliczek commented 5 years ago

Closed by #739