WebXR Raw Camera Access API

w3ctag / design-reviews

W3C specs and API reviews

Creative Commons Zero v1.0 Universal

328 stars 55 forks source link

WebXR Raw Camera Access API #652

Closed bialpio closed 2 years ago

bialpio commented 3 years ago

Ya ya yawm TAG!

I'm requesting a TAG review of WebXR Raw Camera Access API.

This specification introduces new WebXR Device API capability, namely Raw Camera Access API. The newly introduced API enables WebXR-powered applications to access camera image pixels, allowing them to leverage this new information to compute custom per-frame visual effects, or take a snapshot of the app-rendered content overlaid with the camera image.

Explainer¹ (minimally containing user needs and example code): https://github.com/immersive-web/raw-camera-access/blob/main/explainer.md
Specification URL: https://immersive-web.github.io/raw-camera-access/
Tests: N/A yet.
Security and Privacy self-review²: https://github.com/immersive-web/raw-camera-access/blob/main/security-privacy-questionnaire.md
GitHub repo (if you prefer feedback filed there): https://github.com/immersive-web/raw-camera-access
Primary contacts (and their relationship to the specification):
- Piotr Bialecki, @bialpio, Google, Editor
Organization(s)/project(s) driving the specification: Immersive Web CG
Key pieces of existing multi-stakeholder review or discussion of this specification: major discussion started in bialpio/webxr-raw-camera-access#1 (reached consensus), continued in immersive-web/raw-camera-access#2 (no further comments)
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/5759984304390144

Further details:

[x] I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: N/A.
The group where the work on this specification is currently being done: Immersive Web CG
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Immersive Web WG
Major unresolved issues with or opposition to this specification: No major issues yet.
This work is being funded by: N/A.

You should also know that...

N/A.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

torgo commented 3 years ago

Thanks for sending this our way and thanks for documenting the user needs and filling out the security & privacy questionnaire. One question related to privacy - you note in the explainer that the feature does not currently exist due to privacy. However, when it comes to permissions you state:

If UA decides it needs to prompt the user for permission to use the camera, it can do so at this stage.

It feels to me like this needs to be stronger. Can there be a normative requirement to seek user permission or at least stronger language to that effect? It's good that you call this out in the spec https://immersive-web.github.io/raw-camera-access/#privacy-security but likewise it feels like this should be a stronger requirement. Also it feels like there should be more in this section that enumerates abuse scenarios and how the API proposes to mitigate against these abuses.

bialpio commented 3 years ago

Thanks for the quick response! I've landed on this phrasing since I'm viewing the explainers as a teaser to the devs that presents the API and what some of the steps may imply (i.e. it's important for the devs to know that there likely will be a permission prompt when using this feature). For the detailed descriptions, the spec should be the authority, which leads me to...

In the spec draft itself, I have tried to make it explicit that the UA must seek user consent. Unfortunately, I do not see a good way to include it explicitly in some algorithm, since session creation & seeking consent is handled by the WebXR Device API - the module just extends the core spec to add one more feature descriptor.

In the explainer I'll reword this to be a bit stronger that still is more developer-focused, maybe:

As per WebXR Device API, the user agent will seek either explicit or implicit consent before creating a session. This may mean that a permissions prompt will be displayed to the user.

As for the spec, I'll add a note that elaborates on this a bit more to refer implementers to the sections in the core spec that they should be aware of.

kenchris commented 3 years ago

I am a bit worried of us making parallel camera APIs expecially as this seems to be lacking a lot of features that I believe people would want at some later point.

How do I select which camera to use (external vs internal etc)
How to select quality, SD, HD, 4K?
How to do pan/tilt/zoom etc?

torgo commented 3 years ago

Hi - just coming back to this again. The PR you've referenced above is appreciated. I'm still concerned that with the presence of this API developers will always choose to request raw camera access even if they don't need it, thereby rendering the privacy-preserving aspects of regular camera access meaningless... Can the explainer or the spec go into more detail on why this won't be the case?

bialpio commented 3 years ago

I am a bit worried of us making parallel camera APIs expecially as this seems to be lacking a lot of features that I believe people would want at some later point.

How do I select which camera to use (external vs internal etc)

How to select quality, SD, HD, 4K?

How to do pan/tilt/zoom etc?

The goal here was to provide something that is tied to the camera that is currently being used by the XR system to provide the AR experience (for example, AR implementation in Chrome currently composes the camera feed with site's WebGL-rendered content, w/o offering any of those configuration knobs, and also w/o exposing the pixels to the site). I imagine that some of those settings would have to be exposed at XRSession creation (quality & camera selection, pan/tilt/zoom would not be available since the current spec text requires that camera image is aligned with XRView), but it may be something that remains fixed for the entire duration of the session (depending on whether it is allowed the underlying AR framework that the implementations use). There was a long discussion, whose outcome was to pursue a simpler, XRView-aligned camera access API in order to cater for smartphone-specific use cases, while leaving the door open for a more general solution which would integrate with getUserMedia() APIs.

I'm still concerned that with the presence of this API developers will always choose to request raw camera access even if they don't need it, thereby rendering the privacy-preserving aspects of regular camera access meaningless... Can the explainer or the spec go into more detail on why this won't be the case?

I do not think it is possible for us to guarantee that this won't ever be the case, we can only try to incentivize the developers to ask for what they need. We do not attempt to specify the UX in the normative text, but I'd like the UAs to make the distinction between "lower-privilege session" (w/o granting the app access to camera pixels, but exposing the information that ultimately is derived from camera pixels and IMU sensors) and "higher-privilege session" (the camera pixels are accessible to the app) understandable by the user when displaying permission prompts - if the distinction is clear, the raw-camera-access sessions should hopefully be more frequently rejected by the users. I'll add more text to the explainer.

With all that said, the privacy efforts made by WebXR can all be sidestepped if the app asks for camera via gUM() & the user allows it. :frowning_face:

Sauski commented 3 years ago

We do not attempt to specify the UX in the normative text

I think there's a great benefit to including non-normative descriptions / examples of UX that meet the expections in the privacy consideratons section of specs. Especially when the permissions model / user understanding is so integral to the privacy posture of the spec.

The WebNFC spec is a good example of this kind of approach.

torgo commented 3 years ago

Some additional thoughts. It feels like APIs like the upcoming marker detection API will be important for doing things in a privacy preserving fashion. While I understand that raw camera access will allow developers to do lots of things there need to be appropriate drawbacks and warnings because once you spin up this access you're really giving the web site everything.

I'm thinking especially of the cases where an application is designed to perform a certain function (e.g. you're at a restaurant and you scan a QR code to bring you to a web app that allows you to draw funny ears or hats on people's faces) that requires facial recognition and therefore raw camera, but then uses that same facial recognition data for a secondary use contrary to the user's expectations (e.g. correlating that info with other facial recognition info to build up a list of people who were with you at the table for sale to 3rd parties).

Beyond permissions, it really feels to me like there needs to be some additional drawbacks for use of this API that would encourage developers to use the privacy-preserving WebXR AR APIs instead unless the really need Raw Camera access. Have you considered this approach in the working group?

bialpio commented 3 years ago

I'm thinking especially of the cases where an application is designed to perform a certain function (e.g. you're at a restaurant and you scan a QR code to bring you to a web app that allows you to draw funny ears or hats on people's faces) that requires facial recognition and therefore raw camera, but then uses that same facial recognition data for a secondary use contrary to the user's expectations (e.g. correlating that info with other facial recognition info to build up a list of people who were with you at the table for sale to 3rd parties).

Yes, that is unfortunately correct. One thing to note is that the scenario you are describing is already possible on the web, without WebXR's Raw Camera Access API in the picture at all, so by introducing it, I don't believe we're weakening the platform.

Beyond permissions, it really feels to me like there needs to be some additional drawbacks for use of this API that would encourage developers to use the privacy-preserving WebXR AR APIs instead unless the really need Raw Camera access. Have you considered this approach in the working group?

I don't believe we had such discussions within the working group. Do you have specific examples of how could that work? I worry that introducing artificial drawbacks could cause the feature to be unusable, so we need to make sure we strike the right balance here.

Currently, the main limitation of the API that I'd argue falls into the "drawbacks" category is a requirement for the camera texture to align with an XRView - as a consequence, the camera texture has a more narrow field of view compared to the image that the site could get using getUserMedia() APIs, & users have clear visual feedback on what exactly is shared with the site because the same texture is displayed to them (this could be suppressed by a malicious app rendering opaque object across the entire viewport though).

Sauski commented 3 years ago

Beyond permissions, it really feels to me like there needs to be some additional drawbacks for use of this API that would encourage developers to use the privacy-preserving WebXR AR APIs instead unless the really need Raw Camera access. Have you considered this approach in the working group?

It's already challenging to correctly communicate the implications of the "privacy-preserving" WebXR AR APIs to users. Trying to communicate an intermediate level of protection, such that a user is meaninfully more likely to accept the WebXR raw camera permission over the regular camera permission, seems almost impossible.

Given that, I'm on the side of positioning this more as equivalent to the regular camera permission, and leveraging the strong existing user mental model around what granting camera access means. This seems better than the alternative of creating a new type of very-almost-camera that doesn't have the existing mental model to lean on.

Perhaps it's worth highlighting the fact that the site could have just asked for regular WebXR access, but is explictly asking for camera, so users don't build the model that raw camera access is "just part of doing WebXR".

torgo commented 3 years ago

Discussing again in our virtual f2f with @kenchris and @atanassov. @kenchris also pointed out that it's not great having 2 different ways to get access to the camera with totally different areas.

On the privacy topic, some other factors:

People who are bystanders don't have a way to opt out of being part of this scenario. The fact that you can be logged into a service and through the use of this API expose other people to privacy threats is problematic.

If people allow access to camera image currently - it's because they want to do a specific thing - usually video call or take a picture or scan a QR code. In AR you are using the camera more freely around you and for a longer period of time thereby exposing more information. It feels more privacy infringing and therefore worthy of greater protection than purely camera access. Also one benetit or WebAR is that anyone can pick up their phone and start using it without additional software download. That lower barrier to entry also calls for a stronger system of privacy protection. It feels like there needs to be some additional mitigation designed into the API - not part of the permissions request but intrinsic to how the API works - tha makes it more privacy-protecting than a similar approach on native platforms would be. Maybe that means fuzzing - maybe it means turning off access to something else. There must be privacy diffentiators for WebXR that alings with the ethical approach of the web - even if that means it's less powerful than a native equivelant.

bialpio commented 3 years ago

People who are bystanders don't have a way to opt out of being part of this scenario. The fact that you can be logged into a service and through the use of this API expose other people to privacy threats is problematic.

What do you mean here by "logged into a service"?

I think I'm missing something - is there a threat model document that I could read up on? All of the concerns that you are listing above are already possible via getUserMedia() if we assume a malicious website (camera web app / QR code scanner web app don't have to stop accessing the camera after it took a picture / scanned the QR code, and the API is accessible w/o additional software download). After a bad actor convinces the user to grant the permission for camera, the user and the bystanders are already potentially compromised - that's not something specific to WebXR's raw camera access. In a way, WebXR offers slightly more protection that is built into the behavior of the API (on smartphones, the camera is by default displayed to the user as well, and the field of view is more limited compared to what the camera could actually capture).

In the end, any API that allows the sites to access camera pixels can be used maliciously, and I'm worried that crippling this particular API will not increase users' security. On the contrary, if it turns out WebXR doesn't have an answer ready for the use cases that require access to pixels, app developers could fall back to relying on gUM() + SLAM algorithms to enable AR scenarios (we know of one existing product that does it now and is blocked to switch to WebXR due to the lack of a camera access API), in which case the UA is entirely out of the loop (while also potentially sacrificing battery life and the quality of experience, if not done right). The only outcome is that the barrier of entry to offer full AR experience while silently capturing the camera feed may be higher than with WebXR’s API, but there can always be a site that advertises itself as awesome-ar-experience.example.com, asks for camera permissions, fakes an attempt to enter AR, & immediately shows an "oops, something went wrong" message, but keeps recording the camera feed (or even falls back to WebXR for AR experience w/o raw camera access, but will start capturing the camera feed as soon as the user leaves the session).

torgo commented 3 years ago

Hi @bialpio - very sympathetic to your point that developers will just use getUserMedia if this API doesn't provide the functionality they need. I think we want to make sure that developers make use of this API (and the WebXR stack in general) when they are doing AR or VR on the web. AR on the web was not possible until long after getUserMedia was written, so perhaps there needs to be a bigger discussion about permissions and prompting and risks of camera access in general - but that shouldn't block this work.

A privacy principles document including some threat model info is in development right now. We expect to have a first public working draft available soon. In the mean time, you can take a look at some of the relevant design principles such as it should be safe to visit a web page and ask users for meaningful consent when appropriate. To be clear: I am concerned about the threat model of web apps hat collect more data than they should and use it for purposes other than what the person using the ap expects. This concern is significantly amplified when the web application has access to the device's live video feed for a significant period of time. The thrust of the Ethical Web Principles (which was written to inform our other documents such as the design principles) is that the web must be a more ethical environment for people than other platforms (such as native apps) and it's at the design stage that we can make decisions that steer the web in that direction.

What do you think of @Sauski's suggestion that the prompting should highlight to the user that the app has requested special permission over and above regular WebXR AR? Again to be clear I don't think we can solve this issue only by adjusting prompts because of the problematic nature of prompts but it can be one mitigation.

bialpio commented 3 years ago

What do you think of @Sauski's suggestion that the prompting should highlight to the user that the app has requested special permission over and above regular WebXR AR?

I think we're on the same page here, I definitely agree we need to ensure the implementations give sufficient information to the users regarding potential implications of the choices they are presented with and I'll look at WebNFC for inspiration on how to best include this in the spec. I'm mostly wary of including normative text that would mandate the UAs to do one thing or the other, independent of the circumstances. One thing that comes into play here is that WebXR allows the implementations to infer user intent / consent.

To shed some light on the current, behind-the-flag implementation in Chrome: we display a different prompt based on the set of features that an app is requesting to be enabled in a given session (if raw camera access is requested, we will use a prompt that is distinct from what is displayed to the user when an app requires access to the less privacy-invasive features). This is still something that we want to iterate on with our privacy and UX teams, which is one more reason that makes me want to avoid mandating concrete UX in the spec.

Again to be clear I don't think we can solve this issue only by adjusting prompts because of the problematic nature of prompts but it can be one mitigation.

I'd like for the mitigations to stay on the UX side of things, but that does not necessarily mean that they will be limited to consent prompts (one other example: displaying some visual indicator for the entire duration of a session in which the camera pixels are accessible to the app). Unfortunately, I do not think it would be possible to introduce the limitations in the API shape itself without causing the API to fail to meet its purpose, but I admit the only thing that currently comes to my mind is to throttle how often the site would be able to get the camera texture.

In general, do you think that the problem with the current spec lies in the API shape itself (implying that we should change something in it, including at the Web IDL level), or do you expect that we should be able to address the concerns by ensuring that the users can make an informed decision and keep being informed about the camera being in use (implying that the solution can stay at the UX level)?

torgo commented 2 years ago

Hi folks - I read this today and felt it was relevant to our discussion https://www.eff.org/issues/xr. EFF are pointing out some privacy risks associated with XR (in general, not WebXR specifically) and call for (among other things) "privacy-by-design engineering". I think that's what we're doing here and what the TAG are calling for when it comes to the design of this API.

torgo commented 2 years ago

Hi @bialpio - has there been any activity on your end on additional privacy mitigations since we spoke last week?

rhiaro commented 2 years ago

Another option (as well as the visual indicator) for alerting the user - and in particular other people in the vicinity whose privacy may unknowingly be compromised - could be to require a sound, in a similar why to how camera apps are required to make the shutter sound in certain jurisdictions.

What other web platform APIs could be completely prohibited from use while the Raw Camera Access API is in use, without completely removing the utility of Raw Camera Access? Something like this might encourage developers to only use it if they really really need to, as well as protect privacy by restricting what other kinds of information can simultaneously be transmitted back to the origin.

bialpio commented 2 years ago

Hi @bialpio - has there been any activity on your end on additional privacy mitigations since we spoke last week?

Not yet, I still need to reach out internally to get some guidance from the UX team and Privacy team on how we could ensure we communicate that the camera is in use to the users. We'll likely postpone the OT until we have something that we can show to the developers (part of the reason for an OT is to also get feedback on the UX of the API, so it makes no sense to show something that we aren't sure is final).

Another option (as well as the visual indicator) for alerting the user - and in particular other people in the vicinity whose privacy may unknowingly be compromised - could be to require a sound, in a similar why to how camera apps are required to make the shutter sound in certain jurisdictions.

This may be something worth exploring, although I imagine emitting a sound w/ some regular cadence for the entire duration of a session could get tiresome to the users. Are we worried that a malicious app would be able to drown this sound out? At least by doing it, it will be obnoxious to the bystanders.

What other web platform APIs could be completely prohibited from use while the Raw Camera Access API is in use, without completely removing the utility of Raw Camera Access?

We'd need to lock down any kind of API that allows communication with the outside world (XHR, fetch(), maybe history?), and any kind of API that allows the app to persist state (local storage, session storage, file / filesystem?), otherwise, we're risking that whatever was extracted from the camera feed during XR session gets leaked to a server after the session has finished. We'd also need to clear the state of the script once the XR session has ended (so probably reload the site on session end).

torgo commented 2 years ago

Ok thanks for letting us know the status. Regarding locking down other APIs - is this something you're actively looking into? Would you, for example, document that in your privacy considerations? In any case, please ping us when you have an update. Meanwhile we will put this on the agenda for our next f2f in mid December to circle back to.

tangobravo commented 2 years ago

Is there any distinction to be made between headsets and handheld devices here?

Currently with handheld "WebAR" experiences based on getUserMedia, the camera frames are composited with content into a WebGL canvas. That canvas can be captured via captureStream() and Media Recorder, and then shared via Web Share. This is all with using client-side APIs, and with user consent for camera access. It seems the right balance to me between privacy and capability. It effectively allows many of the fun "AR filter" effects from social media apps to be available on the web. It's also useful for applications like product visualisation, combined with an easy photo capture + share mechanic to send on to a family member.

It would be a shame if WebXR sessions (which have the potential to offer better quality tracking with lower power usage) were not able to be used in the same way, I don't see any major difference privacy-wise on handheld devices vs the getUserMedia approach.

It's seems a non-starter to have an audible announcement every 10 seconds "RECORDING IN PROGRESS" for any use of the camera in the web - that would get annoying very quickly in Google Meet calls...

At least in the EFF article the main concerns seem to be around the always-on nature of headsets. One option would be strongly recommending some form of visual indicator that is visible to bystanders when camera access is used? Not all hardware (Oculus Quest?) would have the capability, but it's likely more of a concern for some future always-on AR eyewear.

rhiaro commented 2 years ago

Thanks for your work on this @bialpio. We have left some feedback on the PR about the strength of the statement about the privacy indicator, but overall we are very happy with the direction this is going.

torgo commented 2 years ago

Thanks for taking our feedback into consideration! We really appreciate the time and energy you've put in on this. We look forward to seeing the work progress.