w3c / adapt

Semantics to describe user personalization preferences.
https://w3c.github.io/adapt/
Other
51 stars 27 forks source link

PING Self review - Module 1: Adaptable Content #130

Closed diagram-codesprint closed 4 years ago

diagram-codesprint commented 4 years ago

PING Questionnaire for Personalization Semantics Content Module 1.0

The answers below often reference potential to expose information about a user based on the settings enabled to modify the content in order to personalize it to meet the users needs. In addition to the information contained in this spec, there are other other technologies it builds upon which are not covered here, including JSON-LD, HTML, CSS, HTTP, and HTTPS.

  1. Questions to Consider 2.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?

Just because information can be exposed to the web doesn’t mean that it should be. How does exposing this information to an origin benefit a user? Is the benefit outweighed by the potential risks? If so, how?

In answering this question, it often helps to ensure that the use cases your feature and specification is enable are made clear in the specification itself to ensure that TAG and PING understand the feature-privacy tradeoffs being made.

2.2. Is this specification exposing the minimum amount of information necessary to power the feature?

Regardless of what data is being exposed, is the specification exposing the bare minimum necessary to achieve the desired use cases? If not, why not and why expose the additional information?

2.3. How does this specification deal with personal information or personally-identifiable information or information derived thereof?

Personally information is data about a user (home address) or information that could be used to identify a user (alias or email address). This is distinct from personally identifiable information (PII), as the exact definition of what’s considered PII varies from jurisdiction to jurisdiction.

If the specification under consideration exposes personal information or PII or their derivatives that could still identify an individual to the web, it’s important to consider ways to mitigate the obvious impacts. For instance:

A feature which uses biometric data (fingerprints or retina scans) should refuse to expose the raw data to the web, instead using the raw data only to unlock some origin-specific and ephemeral secret and transmitting that secret instead. Including a factor of user mediation should be considered, in order to ensure that no data is exposed without a user’s explicit choice (and hopefully understanding). One way to achieve this may be the use of Permission API [PERMISSIONS], or additional dialogs like in Payment Request API [PAYMENT-REQUEST-API] 2.4. How does this specification deal with sensitive information?

Just because data is not personally information or PII, that does not mean that it is not sensitive information; moreover, whether any given information is sensitive may vary from user to user. Data to consider if sensitive includes: financial data, credentials, health information, location, or credentials. When this data is exposed to the web, steps should be taken to mitigate the risk of exposing it; for example:

Credential Management [CREDENTIAL-MANAGEMENT] allows sites to request a user’s credentials from a user agent’s password manager in order to sign the user in quickly and easily. This opens the door for abuse, as a single XSS vulnerability could expose user data trivially to JavaScript. The Credential Management API mitigates the risk by offering the username and password as only an opaque FormData object which cannot be directly read by JavaScript and strongly suggests that authors use Content Security Policy [CSP] with reasonable connect-src and form-action values to further mitigate the risk of exfiltration. Geolocation information can serve many use cases at a much less granular precision than the user agent can offer. For instance, a restaurant recommendation can be generated by asking for a user’s city-level location rather than a position accurate to the centimeter. A Geofencing proposal [GEOFENCING] ties itself to service workers and therefore to encrypted and authenticated origins. 2.5. Does this specification introduce new state for an origin that persists across browsing sessions?

Allowing an origin to persist data on a user’s device across browsing sessions introduces the risk that this state may be used to track a user without their knowledge or control, either in a first party or third party contexts. New state persistence mechanisms should not be introduced without mitigations to prevent it from being used to track users across domains or without control over clearing this state. And, are there specific caches that a user agent should specially consider?

For example:

Service Worker [SERVICE-WORKERS] intercept all requests made by an origin, allowing sites to function perfectly even when offline. A maliciously-injected service worker, however, would be devastating (as documented in that spec’s security considerations section). They mitigate the risks an active network attacker or XSS vulnerability present by requiring an encrypted and authenticated connection in order to register a service worker. Platform-specific DRM implementations might expose origin-specific information in order to help identify users and determine whether they ought to be granted access to a specific piece of media. These kinds of identifiers should be carefully evaluated to determine how abuse can be mitigated; identifiers which a user cannot easily change are very valuable from a tracking perspective, and protecting the identifiers from an active network attacker is an important concern. Cookies, ETag, Last Modified, Local Storage, Indexed DB, etc. all allow an origin to store information about a user, and retrieve it later, directly or indirectly. User agents mitigate the risk that these kinds of storage mechanisms will form a persistent identifier by offering users the ability to wipe out the data contained in these types of storage. 2.6. What information from the underlying platform, e.g. configuration data, is exposed by this specification to an origin?

If so, is the information exposed from the underlying platform consistent across origins? This includes but is not limited to information relating to the user configuration, system information including sensors, and communication methods.

When a specification exposes specific information about a host to an origin, if that information changes rarely and is not variable across origins, then it can be used to uniquely identify a user across two origins — either directly because any given piece of information is unique or because the combination of disparate pieces of information are unique and can be used to form a fingerprint [DOTY-FINGERPRINTING]. Specifications and user agents should treat the risk of fingerprinting by carefully considering the surface of available information, and the relative differences between software and hardware stacks. Sometimes reducing fingerprintability may as simple as ensuring consistency, i.e. ordering the list of fonts, but sometimes may be more complex.

Such information should not be revealed to an origin without a user’s knowledge and consent barring mitigations in the specification to prevent the information from being uniquely identifying or able to unexpectedly exfiltrate data.

For example:

The GL_RENDERER string exposed by some WebGL implementations improves performance in some kinds of applications, but does so at the cost of adding persistent state to a user’s fingerprint. These kinds of device-level details should be carefully weighed to ensure that the costs are outweighed by the benefits. The NavigatorPlugins list exposed via the DOM practically never changes for most users. Some user agents have taken steps to reduce the entropy introduced by disallowing direct enumeration of the plugin list. 2.7. Does this specification allow an origin access to sensors on a user’s device

If so, what kind of sensors and information derived from those sensors does this standard expose to origins?

Information from sensors may serve as a fingerprinting vector across origins. In addition, sensor also reveals something about my device or environment and that fact might be what is sensitive. In addition, as technology advances, mitigations in place at the time a specification is written may have to be reconsidered as the threat landscape changes.

Sensor data might even become a cross-origin identifier when the sensor reading is relatively stable, for example for short time periods (seconds, minutes, even days), and is consistent across-origins. In fact, if two user-agents expose the same sensor data the same way, it may become a cross-browser, possibly even a cross-device identifier.

These are not theoretical attacks, for example:

As gyroscopes advanced, their sampling rate had to be lowered to prevent them from being used as a microphone as one such example [GYROSPEECHRECOGNITION]. ALS sensors could allowed for an attacker to exfiltrate whether or not a user had visited given links [OLEJNIK-ALS]. Even relatively short lived data, like the battery status, may be able to serve as an identifier if misused/abused [OLEJNIK-BATTERY]. 2.8. What data does this specification expose to an origin? Please also document what data is identical to data exposed by other features, in the same or different contexts.

As noted above in § 3.3 Same-Origin Policy Violations, the same-origin policy is an important security barrier that new features need to carefully consider. If a specification exposes details about another origin’s state, or allows POST or GET requests to be made to another origin, the consequences can be severe.

Content Security Policy [CSP] unintentionally exposed redirect targets cross-origin by allowing one origin to infer details about another origin through violation reports (see [HOMAKOV]). The working group eventually mitigated the risk by reducing a policy’s granularity after a redirect. Beacon [BEACON] allows an origin to send POST requests to an endpoint on another origin. They decided that this feature didn’t add any new attack surface above and beyond what normal form submission entails, so no extra mitigation was necessary. 2.9. Does this specification enable new script execution/loading mechanisms?

HTML Imports [HTML-IMPORTS] create a new script-loading mechanism, using link rather than script, which might be easy to overlook when evaluating an application’s attack surface. The working group notes this risk, and ensured that they required reasonable interactions with Content Security Policy’s script-src directive. New string-to-script mechanism? (e.g. eval() or setTimeout([string], ...)) What about style? 2.10. Does this specification allow an origin to access other devices?

If so, what devices does this specification allow an origin to access?

Accessing other devices, both via network connections and via direct connection to the user’s machine (e.g. via Bluetooth, NFC, or USB), could expose vulnerabilities - some of these devices were not created with web connectivity in mind and may be inadequately hardened against malicious input, or with the use on the web.

Exposing other devices on a user’s local network also has significant privacy risk:

If two user agents have the same devices on their local network, an attacker may infer that the two user agents are running on the same host or are being used by two separate users who are in the same physical location. Enumerating the devices on a user’s local network provides significant entropy that an attacker may use to fingerprint the user agent. If the specification exposes persistent or long lived identifiers of local network devices, that provides attackers with a way to track a user over time even if a user takes steps to prevent such tracking (e.g. clearing cookies and other stateful tracking mechanisms). Direct connections might be also be used to bypass security checks that other APIs would provide. For example, attackers used the WebUSB API to access others sites' credentials on a hardware security, bypassing same-origin checks in an early U2F API. [YUBIKEY-ATTACK] Example mitigations include:

The Network Service Discovery API [DISCOVERY] recommends CORS preflights before granting access to a device, and requires user agents to involve the user with a permission request of some kind. The spec’s Security and privacy considerations" section has more details. Likewise, the Web Bluetooth [BLUETOOTH] has an extensive discussion of "Security and privacy considerations", which is worth reading as an example for similar work. The WebUSB standard addresses these risks through a combination of user mediation / prompting, secure origins, and feature policy. [WEBUSB] 2.11. Does this specification allow an origin some measure of control over a user agent’s native UI?

Features that allow for control over a user agent’s UI (e.g. full screen mode) or changes to the underlying system (e.g. installing an ‘app’ on a smartphone home screen) may surprise users or obscure security / privacy controls. To the extent that your feature does allow for the changing of a user agent’s UI, can it effect security / privacy controls? What analysis confirmed this conclusion?

2.12. What temporary identifiers might this this specification create or expose to the web?

If a standard exposes a temporary identifier to the web, the identifier should be short lived and should rotate on some regular duration to mitigate the risk of this identifier being used to track a user over time. When a user clears state in their user agent, these temporary identifiers should be cleared to prevent re-correlation of state using a temporary identifier.

If this specification does create or expose a temporary identifier to the web, how is it exposed, when, to what entities, and, how frequently is it rotated?

Example temporary identifiers include TLS Channel ID, Session Tickets, and IPv6 addresses.

An example implementations of a privacy friendly temporary identifier include:

The index attribute in the Gamepad API [GAMEPAD] which is an integer that starts at zero, increments, and is reset. 2.13. How does this specification distinguish between behavior in first-party and third-party contexts?

The behavior of a feature should be considered not just in the context of its being used by a first party origin that a user is visiting but also the implications of its being used by an arbitrary third party that the first party includes. When developing your specification, consider the implications of its use by third party resources on a page and, consider if support for use by third party resources should be optional to conform to the specification. If supporting use by third party resources is mandatory for conformance, please explain why and what privacy mitigations are in place. This is particularly important as user agents may take steps to reduce the availability or functionality of certain features to third parties if the third parties are found to be abusing the functionality.

2.14. How does this specification work in the context of a user agent’s Private Browsing or "incognito" mode?

Each major user agent implements a private browsing / incognito mode feature with significant variation across user agents in threat models, functionality, and descriptions to users regarding the protections afforded [WU-PRIVATE-BROWSING].

One typical commonality across user agents' private browsing / incognito modes is that they have a set of state than the user agents’ in their ‘normal’ modes.

Does the specification provide information that would allow for the correlation of a single user’s activity across normal and private browsing / incognito modes? Does the specification result in information being written to a user’s host that would persist following a private browsing / incognito mode session ending?

There has been research into both:

Detecting whether a user agent is in private browsing mode [RIVERA] using non-standardized methods like Firefox’s requestFileSystem. Using features to fingerprint a browser and correlate private and non-private mode sessions for a given user. [OLEJNIK-PAYMENTS] 2.15. Does this specification have a "Security Considerations" and "Privacy Considerations" section?

Documenting the various concerns and potential abuses in "Security Considerations" and "Privacy Considerations" sections of a document is a good way to help implementers and web developers understand the risks that a feature presents, and to ensure that adequate mitigations are in place. Simply adding a section to your specification with yes/no responses to the questions in this document is insufficient.

If it seems like a feature does not have security or privacy impacts, then say so inline in the spec section for that feature:

There are no known security or privacy impacts of this feature. Saying so explicitly in the specification serves several purposes:

Shows that a spec author/editor has explicitly considered security and privacy when designing a feature. Provides some sense of confidence that there might be no such impacts. Challenges security and privacy minded individuals to think of and find even the potential for such impacts. Demonstrates the spec author/editor’s receptivity to feedback about such impacts. Demonstrates a desire that the specification should not be introducing security and privacy issues [RFC3552] provides general advice as to writing Security Consideration sections. Generally, there should be a clear description of the kinds of privacy risks the new specification introduces to for users of the web platform. Below is a set of considerations, informed by that RFC, for writing a privacy considerations section.

Authors must describe:

What privacy attacks have been considered? What privacy attacks have been deemed out of scope (and why)? What privacy mitigations have been implemented? What privacy mitigations have considered and not implemented (and why)? In addition, attacks considered must include:

Fingerprinting risk; Unexpected exfiltration of data through abuse of sensors; Unexpected usage of the specification / feature by third parties; If the specification includes identifiers, the authors must document what rotation period was selected for the identifiers and why. If the specification introduces new state to the user agent, the authors must document what guidance regarding clearing said storage was given and why. There should be a clear description of the residual risk to the user after the privacy mitigations has been implemented. The crucial aspect is to actually considering security and privacy. All new specifications must have security and privacy considerations sections to be considered for wide reviews. Interesting features added to the web platform generally often already had security and/or privacy impacts.

2.16. Does this specification allow downgrading default security characteristics?

Does this feature allow for a site to opt-out of security settings to accomplish some piece of functionality? If so, in what situations does your specification allow such security setting downgrading and what mitigations are in place to make sure optional downgrading doesn’t dramatically increase risks?

document.domain [CORS] [WEBMESSAGING] referrer 'unsafe-always' 2.17. What should this questionnaire have asked?

This questionnaire is not exhaustive. After completing a privacy review, it may be that there are privacy aspects of your specification that a strict reading, and response to, this questionnaire, would not have revealed. If this is the case, please convey those privacy concerns, and indicate if you can think of improved or new questions that would have covered this aspect.