Closed quidity closed 2 years ago
@quidity thank you for this security review, much appreciated. We've tentatively scheduled this topic for discussion on our next meeting 13 January 2022 - 15:00-16:00 UTC. I acknowledge this may not be a reasonable meeting time for you, so we'll follow up in this issue afterwards.
We discussed these security review questions on WebML WG Teleconference – 27 January 2022: https://www.w3.org/2022/01/27-webmachinelearning-minutes.html#t01
The WG will follow up with clarifying questions and proposed changes. This security review is tracked as part of wide review https://github.com/webmachinelearning/webnn/issues/239 of the WebNN spec.
On behalf of the group, I want to thank @quidity and Chrome Security team for this security review!
@quidity, as shared earlier, the group discussed your Security Review feedback and here's a summary of the group's thinking and proposed next steps. If your think we're going to the right directions, we'd be happy to know that, and even more so if our proposed next steps need course correction!
The group also had some questions to better understand some part of the feedback, in bold below.
Thanks for your review once again!
- I think I disagree with the answer given in 2.9 in https://github.com/webmachinelearning/webnn/blob/main/security-privacy.md. This is introducing a new scripting language, built up in js then executed in a number of different interpreters (“cpu”, “gpu” etc.). The complexity of the operations is such that the likelihood of errors that ultimately result in out of bounds accesses is high, and malicious sites will have significant control over operations. This could be addressed more directly in the security/privacy explainer.
The group agreed with out-of-bounds access concern and proposed to address this in the test suite and also note OOB in the Security Considerations.
The group did not agree with the notion that the API introduces a new scripting language and wants to understand what aspects of the API are of concern?
- Operations such as split/slice/squeeze that change the shape of tensors mid-calculation may lead to incorrect assumptions in later operations - for instance if eliding bounds checks this could lead to out of bounds accesses. It would be good for their to be operation level metadata that might be consumed by implementors to help prevent such problems.
The group will review each operator identifying OOB access issues and add appropriate guidance to implementers for mitigations.
The group was not sure what "operation level metadata" exactly means in this context?
- The universe of operations is likely to vary in future - how will consumers discover which operations are available (short of enumerating them through failures to instantiate)? How will operations be deprecated (for instance if they turn out to be badly implemented?)
The group proposed to document such design principles in the Security Considerations.
To make possible deprecation easier, the high-level functions can be decomposed into smaller operations as described in the explainer. That is, possibly problematic ops could be deprecated and replaced with a polyfill implementation (likely trading some performance for security).
- It feels like .build and .compute should be asynchronous in all cases?
This design consideration is being discussed in https://github.com/webmachinelearning/webnn/issues/230 and your feedback is taken into consideration.
- New side channels will be made available from shared resources (cpu/gpu). Timeable things should be out of process so incur at least some ipc to achieve anything. Probably not a massive worry when compared with already sharing a cpu between processes running renderers.
The group agrees this is an area worth a clarification, any relevant guidance to implementers to be documented in the Security Considerations.
The group would like to understand how running timeable things out of process works as a mitigation?
- Verify: Sites must delegate permission to host/run models.
Correct. The permissions policy integration in place means the top-level must delegate permissions for an iframe to run a model using this API.
- Verify: No serialization or caching yet - although this is likely in future.
Correct. No such API affordances exposed, out of scope for the spec. That said, the group is investigating if implementation-level caching warrants guidance to implementers.
- Control over how a model is run - (selecting cpu/gpu/tpu say) - is this too much power for the consuming site - it will for instance make it possible to more directly target a flawed implementation. It's not clear why this is required.
The device selection is a hint only (device selection left to the implementation) and there's no device enumeration API. This partially mitigates the concern.
Sub issues:
242
243
244
Please find the group's comments in these sub issues.
I started a draft PR to incorporate this feedback into the Security Considerations of the specification: https://github.com/webmachinelearning/webnn/pull/251
This PR, once reviewed and merged, will help tick the first box in the Security wide review we're tracking as part of https://github.com/webmachinelearning/webnn/issues/239
To tick the second Security wide review box, we'll explicitly request an "official" (used to be Web Security IG) Security wide review to get even more eye balls scrutinise the security aspects of the spec.
@quidity, we'd appreciate if you could help answer the questions the group had with respect to your review questions, highlighted in bold in https://github.com/webmachinelearning/webnn/issues/241#issuecomment-1025952050, also copied below for your convenience:
The group did not agree with the notion that the API introduces a new scripting language and wants to understand what aspects of the API are of concern?
The group was not sure what "operation level metadata" exactly means in this context?
(Would addressing https://github.com/webmachinelearning/webnn/issues/243 satisfy all "operation level metadata" requirements?)
The group would like to understand how running timeable things out of process works as a mitigation?
Thank you for your help in hardening the security of this API!
The group did not agree with the notion that the API introduces a new scripting language and wants to understand what aspects of the API are of concern?
My security concern is that the network is compiled into a program that is entirely under an attackers control. While not as powerful as javascript it is likely powerful enough to make some exploits easier as a result. It argues for careful implementation, and avoiding introducing too much control of control-flow at the API side. Hopefully these concerns can be considered as new operations are introduced.
The group was not sure what "operation level metadata" exactly means in this context?
(Would addressing https://github.com/webmachinelearning/webnn/issues/243 satisfy all "operation level metadata" requirements?)
Yes, that would be a helpful step.
The group would like to understand how running timeable things out of process works as a mitigation?
This is a concern of lower importance - mainly adding an ipc step will confuse any accidental high precision timers. Realistically there is not a lot to be done here!
Thank you for your help in hardening the security of this API!
A pleasure, thanks for the detailed response.
PR https://github.com/webmachinelearning/webnn/pull/251 has been updated to address the following:
The remaining work from this issue #241 is to decide how to address:
Our last discussion https://www.w3.org/2022/02/24-webmachinelearning-minutes.html#t06 on that topic was inconclusive. I propose we track #243 separately and mark this "General Security Questions" issue resolved when PR #251 is reviewed and landed.
Thanks for your contributions everyone. We're making strong progress with this important security review topic.
@quidity feel free to chime in #243 to help us move to a direction aligned with your expectations.
I'm reopening this auto-closed issue to circle back to @quidity to confirm he's happy with the general direction security considerations are taking: https://www.w3.org/TR/webnn/#security (as of writing this gh-pages deploy is not yet complete and the hosted doc not updated yet, but that should be fixed soon)
Our next step is to request security part of wide review tracked in https://github.com/webmachinelearning/webnn/issues/239
Thanks - this extra thinking is helpful! I don't think there's more to do here.
@quidity thanks for the confirmation!
Our next step is to request security part of wide review tracked in https://github.com/webmachinelearning/webnn/issues/239
Fyi: Submitted security review request (part of wide review) https://github.com/w3c/security-request/issues/22 and updated https://github.com/webmachinelearning/webnn/issues/239 accordingly.
I'm a security reviewer from Google Chrome and have some general questions - they almost certainly have good answers and it's very possible I missed finding them in the spec. I also have some specific suggestions which I will open as distinct issues shortly.
Sub issues:
242
243
244