w3c / trusted-types

A browser API to prevent DOM-Based Cross Site Scripting in modern web applications.
https://w3c.github.io/trusted-types/dist/spec/
Other
603 stars 72 forks source link

Semantics and naming of types #7

Closed xtofian closed 5 years ago

xtofian commented 7 years ago

Naming

"Trusted" doesn't quite seem the right term to use. In many cases, the value of those types will be entirely or partially derived from untrusted data, however the values will be known to be safe to use in the destination (sink) context due to appropriate validation/sanitization/escaping that has been applied to the original value. For instance, in

var u = TrustedURL.sanitize(untrustedInput)

the string value of u will equal the string value of untrustedInput (i.e. consist entirely of a value from an untrusted source), if untrustedInput passes validation (e.g., is a well-formed http URL).

Of course, in some cases a value can be established to be safe due to its trustworthy provenance (e.g. a string consisting of a javascript:... URL can be treated as a SafeUrl if it is known to come from a trusted source), but that's usually just one way of establishing the type contract.

Semantics and type contracts

The type contracts used in Closure have a number of unresolved issues, which stem from the inherent complexity of the web platform.

In the setting of a particular application domain, we have largely been able to ignore or gloss over such issues; however, for the purposes of a standard they should presumably be kept in mind.

For example:

Also worth considering is the granularity of the types: In Closure, safe-content types are relatively coarse grained. For instance, the same type (TrustedResourceUrl) is used to represent URLs that are interpreted as script (<script src>), style (<link rel=stylesheet href>), and embedded content (<iframe src>). This choice was made to reduce the number of types to a manageable vocabulary, and has generally worked out reasonably well in practice. However, it is not clear if this tradeoff between complexity of type vocabulary and expressiveness is appropriate in the general case.

In particular, using an untrusted resource as the src of an (non-sandboxed) <iframe> has obviously much less damage potential (largely limited to social engineering / phishing attacks) than using it as a script source.

Proposal: "value neutral" type contracts

Thus, it seems rather difficult if not impossible to come up with precise specifications of type contracts that prescribe what values are "safe" or "trusted" for a given sink context, such that these specifications generalize well and don't make unwarranted assumptions about the particular application domain or development frameworks.

With that in mind, it may make sense to essentially dodge all these issues and define types and their contracts not in terms of safety/trustedness, but rather simply to characterize the semantics of the specific sink that will interpret the value, and do so at a fairly granular level. (Granularity seems to already be the intent; the proposal states "Introduce a number of types that correspond to the XSS sinks we wish to protect.") IOW, "value neutral" type contracts that focus on how the value will be used, rather than whether or not it is safe.

I.e., types such as

One downside of this "value neutral" type naming is that the security implications of creating values of these types are less prominent. However, instances of the types can only be created via the .unsafelyCreate factory function, which conveys the security aspect (and in larger project will typically be usage restricted to ensure security review).

koto commented 7 years ago

Thanks for the thoughtful feedback, I really appreciate it. My thought on that below:

Re: Naming

This name was chosen deliberately. Values assigned to the proposed types are later on trusted by the application. This doesn't imply that they are either trustworthy or safe - that only means that the application author effectively trusts those values (that trust was implicit when strings were used instead; now we are only making it explicit). We do not aim to enforce a stronger contract i.e. trusted types may still cause an XSS vulnerability, even if URL/HTML is sanitised. One example cause for the vulnerability might be open redirects, another might be triggering of script gadgets in the included TrustedHTML.

This makes the trusted types different than e.g. Closure types - they are more loose in the sense that the contract enforced is a weaker one. We only aim to allow for creating applications, which security evaluation focuses on parts producing the values that often cause DOM XSSes, not on the sinks. Enforcing a stronger contract would be problematic - e.g. in the platform we do not (yet?) have a way of recognising a constant, the sanitisation support is also quite shaky. Similarly, like you noted, unless the applications are packaged, a document itself (using the trusted types) cannot assure what is hosted under a URL (even if no open redirect is present). We can only do some approximations where we feel they make sense. Precisely for the reasons outlined I believe the values are effectively trusted, but we don't know if they are safe, though both properties can be met if combined with e.g. a build system that validates additional properties at compile-time. Using trusted types alone does not make your application DOM XSS-safe, but it makes it easier to verify (in automated or manual fashion) that it is effectively the case.

Re: Type contracts

We share a lot of the same challenges of a Closure safe type systems, but some are unique to either of those. One important challenge of trusted types is that they must work even when there is no compiler, and in a context of a single document. This is what caused the "downgrade" of the contract we're effectively able to enforce. Keeping that in mind:

Re: value neutral contract

This actually seems similar to what we are actually doing - the types are value neutral in the sense that they don't imply safety. We are deviating sometimes (e.g. a few less types, sometimes a blunt sanitizing / neutralizing builder exposed), but this is more-or-less how I think about the trusted types contract, and what I think the name implies (author trusts the value to be a script URL not causing havoc in the application).

xtofian commented 6 years ago

Well, the whole point of introducing these types is to have a means of distinguishing values that are potentially unsafe to assign to a given sink from those that are indeed safe.

The problem that's solved by these types is that it is essentially impractical to audit all uses of XSS sinks in a non-trivial application, and flows of data into each of these uses. By changing sinks to only accept values of specific types (and/or interposing run-time predicates that validate or sanitize values that aren't already of the sink-specific "safe" type), we reduce this intractable problem to the much simpler problem of auditing only code that produces values of such types.

The purpose of these types is to reduce the "trusted code base" (i.e. code that needs to be assumed and manually verified to be correct) with respect to the security property "application has no XSS bugs" from a large fraction of the application source code (all call-sites of DOM XSS sinks, and all code that flows data into those), to the much smaller set of code consisting of the call-sites and fan-in of unsafelyCreate functions.

I.e., to fully realize the benefits of this approach, developers and security reviewers absolutely have to make the assumptions that all instances of the various types have values that are in fact safe to use in their corresponding contexts. (Which is why it's so important that an application code base contains only relatively few call sites of unsafelyCreate, and that those call-sites are structured to be "readily reviewable"; see #31).

I still think the spec should not prescribe a specific type contract, since it's not clear what that contract should be, what security considerations it should address (just XSS? layout-based social engineering-attacks?), or that there even is a single contract that is appropriate for all conceivable web apps.

I think we agree on this; where we differ is that I do believe that in any given use of these types in a particular application, they do have to have a security contract; it's just that that contract might to some extent differ across applications and development organizations; hence the spec shouldn't try to define (let alone enforce one).

Another way to look at it: In any given application, there'll be an implicit effective contract for each of these types, which is defined as the union of the sets of values produced at the application's reachable call sites of the type's unsafelyCreate function, for all possible, reachable program states. It's the developer's (or security reviewer's) responsibility to ensure that each of these sets only contains values that are safe to assign to the sinks that accept the type.

I.e., there isn't a universal definition of "safe" attached to the types as per the spec, but there certainly is a notion of "safe" that developers/reviewers would attach to these types in the context of their specific code base. The definition of "safe" is whatever makes sense in the context of their app, and is embodied in the implementation of builder/producer APIs that create values of TrustedTypes through use of unsafelyCreate.

(I realize I'm contradicting myself wrt what I said in the original issue under "value neutral" contacts).

One implication that falls out of this is that any factory functions that are defined in the spec (e.g. TrustedURL.sanitize) must be designed such that the values they produce are within any conceivable notion of "safe" for their corresponding sinks, across applications. This is not necessarily trivial. For instance, should TrustedURL.sanitize accept tel: URLs? On one hand, they're certainly benign with respect to XSS. However, there are contexts where they are problematic (or at least used to be -- e.g., within webviews in iOS, see discussion at https://github.com/golang/go/issues/20586#issuecomment-311503707). And, it's not even clear that the standard should permit arbitrary http/https URLs. For instance, an application might want to ensure that it only renders hyperlinks that refer within the scope of this application itself, and that links to external sites go through a redirector that performs real-time phishing detection or some such.

This indicates that perhaps the spec shouldn't provide for factory methods for these types at all. This might be a quite reasonable direction: I'd expect these types in the currently spec'd form (with very basic factory functions) to not be all that useful on their own.

To be practically useful, these types will need factories/builders for many common scenarios, e.g.:

Of course, all of these factory functions can be replaced with ad-hoc code that relies on unsafelyCreate. However, that's very undesirable since it'll result in a relatively large number of such calls throughout the application source code, which will significantly erode the reviewability benefits.

I.e.. I'd expect these types to be used as a core building block of framworks and libraries that endow them with richer semantics (and corresponding builders and factories), such as the Closure SafeHtml types and their builders, or the SafeHtml types in Angular.

(aside: the one factory function in the current draft spec that's reasonably uncontroversial is TrustedHTML.escape -- escaped text should be within the notion of "safe for HTML" in the context of any conceivable application; however it's also not particularly useful in practice -- applications should just assign to textContent instead of using innerHTML with TrustedHTML.escape).

xtofian commented 6 years ago

Re: Naming. One concern with the use of "trusted" in type names is that among security engineers, "trusted" refers to a thing that we need to trust (i.e. depend on) in order for a security property of our system to hold. (as opposed to something trustworthy, which we simply can trust so to speak).

With the introduction of the types, we've exactly removed the need to trust each and every value that flows into a sink (and the code that's involved in accomplishing that); we now only need to trust the code that's involved in producing new instances of such types (i.e. call sites of unsafelyCreate and their fan-in).

xtofian commented 6 years ago

Another naming-related comment: unsafelyCreate is somewhat inaccurate as well: In a correct and non-vulnerable program, all call sites of unsafelyCreate are in fact safe, in the sense that for any reachable program state, the string passed to T.unsafelyCreate must be a string that is safe to use in the sink context for corresponding to type T. If that were not the case, the program would have a vulnerability.

I.e. these conversion functions are potentially unsafe, but must only be used in scenarios where they are actually safe (as established through security review of each call-site).

koto commented 6 years ago

I extracted the sanitizers discussion to #32, let's leave this bug for the naming concerns.

... to which I so far have an unsatisfactory answer.

We chose trusted, as that for us meant that not we need to trust the values (not to introduce XSS), we are trusting them. If the application using TT has an XSS vulnerability, it's because it trusted a value that was not trustworthy. This signifies that creating values of such types should be audited, as later on the application trusts the types to be safe. Briefly looking at the Trusted_systems_in_information_theory, it seems to match that definition too.

I still think TT name catches our intention correctly, but I don't have any strong opinion. If there's a compelling alternative name, I'm all for it. It looks like with introduction of sanitizers (#32) and the ability to lock down unsafe creations (#31) we might be able to make Trusted Types safe (wink wink nudge nudge).

xtofian commented 6 years ago

I don't have a terribly strong opinion on naming either. And I agree with you -- in a code base where unchecked conversions (unsafelyCreate) are locked down, it really does not matter all that much: Application developers don't really need to understand the nuanced semantics of the types. All they need to know is that they have to provide a value of the right type for the sink in order for it to be accepted. With calls to unsafelyCreate being constrained, the only way of doing that is via a (safe) builder API. Only developers / security engineers who write/review code that uses unsafelyCreate do have to understand the purpose and semantics of these types, and the implications of using them incorrectly.

In other words, neither developer usability nor the resulting security posture should be significantly affected even if we called the types FrobHTML and so forth: If a developer writes foo.innerHTML = someString, they'd get an exception that .innerHTML wants a FrobHTML. They'd then go and read the documentation on how to make values of type FrobHTML. APIs to create FrobHTMLs (except for unsafelyCreate, but that's usage-restricted) are designed to be inherently safe, in the sense that no matter how they are called, they produce FrobHTML values that will not result in security vulnerabilities when assigned to .innerHTML.

FWIW, I'm not totally happy with the naming convention of SafeHtml either. In particular, there are scenarios where a value is a priori unsafe in the current context, but is used in a different scenario where it's actually safe. For instance,

let safeHtml = SafeHtml.unsafelyCreate(someAttackerControlledString);
let iframe = document.createElement('iframe');
iframe.sandbox = 'allow-scripts'
iframe.srcdoc = safeHtml;

This code is safe (wrt XSS) because the untrusted HTML is populated into a sandbox iframe. However, the safeHtml instance is absolutely not safe in the current context, and it's not all that appropriate that the type is called SafeHtml here (i.e., it'd be a security bug to assign it to .innerHTML even though SafeHtml is normally supposed to be safe for that).

This dissonance arises because the type contract is implicitly dependent on the origin the value is used in, but the type name doesn't capture that.

OTOH, the above example doesn't sound any better with TrustedHTML. This led me to the suggestion to not include Trusted or Safe or similar notions in the type names, and rather have them represent "values that are accepted by the corresponding sink" (e.g., HyperLinkURL is the type that's accepted and required by <a href="..."> and `).

At the same time, this discussion has shown that for these types to be useful, their contracts (as implicitly defined by the union of their builders in scope of the app) do have to have a notion of "safety" or "trustedness" -- the whole point is that assigning a value of a given type to a corresponding sink that accepts that type should never result in a security vulnerability.

Anyway, naming is hard...

xtofian commented 6 years ago

Another way to look at it: With TrustedTypes enabled/enforced, the application's TCB wrt the security property "there are no XSS vulnerabilities" are the types, or more specifically, the code involved in constructing instances of the types (i.e. call sites of unsafelyCreate and surrounding code). IOW, we want the property that if the application were to have an XSS, the root cause has to be in that TCB.

In that sense, using the term "trusted" in type names is kind of appropriate. I'm still a little bothered however by the fact that in the meaning of "trusted" as in "TCB", it's not the types or their values that are being trusted, but rather the code surrounding call sites of unsafelyCreate.

(of course, the browser itself, its implementation of the SOP and so on, is part of the XSS-relevant TCB too; but in this context we're interested in bugs in application code, not the platform itself)

koto commented 5 years ago

Let's just go with Trusted, it seems the name stuck. Future visitors, please reopen if you feel strongly against it.