w3c / trusted-types

A browser API to prevent DOM-Based Cross Site Scripting in modern web applications.
https://w3c.github.io/trusted-types/dist/spec/
Other
605 stars 74 forks source link

Defer `fromLiteral`? #398

Closed lukewarlow closed 10 months ago

lukewarlow commented 10 months ago

While the hypothetical use case for fromLiteral is explained it's unclear if that actually translates into real world usage of this API.

It also suffers from implementability issues, the underlying TemplateMap functionality isn't implemented inside of JavaScriptCore edit: it is implemented in JSC but SpiderMonkey apparently has an alternative mechanism. It never shipped inside of Chromium despite having an intent to ship and this seems to be as a result of technical difficulties in the implementation. I imagine similar issues may exist in Gecko.

There's also no way to properly polyfill this without exposing a potential bypass (no userland way to detect a true template literal).

With all this in mind it would make sense to defer this to v2 of the spec, that way there's more time for something like https://tc39.es/proposal-array-is-template-object/ to progress to make both polyfilling and implementation easier.

cc @mbrodesser-Igalia @bkardell

lukewarlow commented 10 months ago

There's also currently spec issues with this functionality see https://github.com/w3c/trusted-types/issues/374

caridy commented 10 months ago

At salesforce, we are ok deferring fromLiteral.

mbrodesser-Igalia commented 10 months ago

Wondering if deferring fromLiteral would prevent web-developers from enforcing to forbid trusted type policies.

That's because Content-Security-Policy: trusted-types 'none'; require-trusted-types-for 'script'allows the usage of fromLiteral. Hence simplifying to transition away from trusted types policies to non-XSS APIs instead. The latter is mentioned as a goal in https://github.com/mozilla/standards-positions/issues/20#issuecomment-1783279722.

Here, experience from people having actually applied TT could be beneficial.

lukewarlow commented 10 months ago

I think it's definitely a valuable addition to the API in terms of simplifying certain code paths. But I think given Google and other's success without it (it's not shipped in Chrome) + the implementation difficulties + lack of polyfillability (which I think is quite important) it's probably worth defering.

I agree the end goal is to just do trusted-types; (or trusted-types 'none') and just not using XSS APIs. I think the reality of that will depend on the santizer API and whether it effectively counts as a trusted sink. I'm unaware how else you'd use web components for example without having at least some XSS sink usage.

mbrodesser-Igalia commented 10 months ago

I think it's definitely a valuable addition to the API in terms of simplifying certain code paths. But I think given Google and other's success without it (it's not shipped in Chrome) + the implementation difficulties + lack of polyfillability (which I think is quite important) it's probably worth defering.

Makes sense.

I agree the end goal is to just do trusted-types; (or trusted-types 'none') and just not using XSS APIs. I think the reality of that will depend on the santizer API and whether it effectively counts as a trusted sink.

The API of a "trusted sink" is unclear to me. Presumably it's meant that the Sanitizer API produces trusted types. Anyway, don't need to discuss it in this ticket.

I'm unaware how else you'd use web components for example without having at least some XSS sink usage.

koto commented 10 months ago

Here, experience from people having actually applied TT could be beneficial.

In our experience we quite often encountered a need to generate a constant TT value (most often it was a short HTML snippet, or a constant script URL). We also encountered some JS build tool that outputted code wrapped in a series of eval(a constant string) calls.

To make sure these are TT compliant, we have used the quite verbose TT policy API (createPolicy + createXYZ calls).

Some of these cases can be refactored not use DOM XSS sinks (e.g. for HTML we could use a series of createElement, appendChild), but in some cases it proved non-trivial (e.g. changing the JS compilation steps was very involved) or impossible (e.g. the application absolutely needs loading a script dynamically, from one of the passed constant string values). fromLiteral is a result of that - had it been available, our migrations would have been much, much simpler. I suspect this will be the case even more when dealing with 3rd party libraries -- getting a patch adding fromLiteral is much easier than one that creates intermediary policy objects only to use their output values once.

Paraphrasing, TT with fromLiteral is much more useful to the authors than without - we indeed have completed many migrations without fromLiteral but would have much preferred if it was available at that time.

What is materially important here is the implementation difficulty (as the spec part is trivial) - are there some estimates of it for non-v8 engines? @otherdaniel, I believe we actually are on the verge of shipping it, is that correct?

lukewarlow commented 10 months ago

are there some estimates of it for non-v8 engines?

I can't speak for Gecko but in WebKit the Ecmascript feature this spec relies on ([TemplateMap] slot in realms) is unimplemented. So it would require changes to JavaScriptCore I'm unable to give an estimate on how long that would take.

As for Chrome I can see that the main issue https://bugs.chromium.org/p/chromium/issues/detail?id=1271149 hasn't been materially changed for over a year now. It's marked as blocked on https://bugs.chromium.org/p/v8/issues/detail?id=13324 which similarly hasn't been touched in over a year.

koto commented 10 months ago

I can't speak for Gecko but in WebKit the Ecmascript feature this spec relies on ([TemplateMap] slot in realms) is unimplemented.

Wouldn't that be observable? I'm not very familiar with the ES algorithms, but the early return in https://tc39.es/ecma262/#sec-gettemplateobject suggests that a template map is necessary to prevent observable differences between implementations.

lukewarlow commented 10 months ago

I believe this is the implementation of that algorithm in JavaScriptCore https://searchfox.org/wubkat/source/Source/JavaScriptCore/runtime/JSTemplateObjectDescriptor.cpp#58

Edit: It's actually here https://searchfox.org/wubkat/source/Source/JavaScriptCore/runtime/ScriptExecutable.cpp#430

caridy commented 10 months ago

From the ES Spec point of view, I always thought this was going to be a problem if we use the same mechanism that we use TemplateMap, and even if we use some other mechanism, I'm not very optimistic about it. Few notes:

  1. [[TemplateMap]] also include dynamic values (the array associate to it), which cannot be supported in TT.
  2. [[TemplateMap]] is used during parsing (via GetTemplateObject) when a template literal is encountered, but fromLiteral should support regular strings as well.
  3. what about indirect eval? can you use fromLiteral with indirect eval? if so, then how do you know when to map those values into the internal slot? that is unclear to me.
  4. To support indirect eval, you will have to map every string during parsing, and every template literal without interpolation, which will probably have some performance implication.
lukewarlow commented 10 months ago

Okay so after some further investigation JSC does have a template map so that could probably be used, though the spec as is is o(n) which isn't ideal.

Alternatively if we used https://tc39.es/proposal-array-is-template-object/ it would be o(1).

I have an implementation of that (the internal slots bit not the exposed static function) and using that makes fromLiteral trivial to implement.

lukewarlow commented 10 months ago

@caridy

1 [[TemplateMap]] also include dynamic values (the array associate to it), which cannot be supported in TT.

I don't fully understand what you mean here? The check to make sure there's no variables is separate to the check for a valid template.

2 [[TemplateMap]] is used during parsing (via GetTemplateObject) when a template literal is encountered, but fromLiteral should support regular strings as well.

It can't support normal strings as there's no way to guartunee they're static. fromLiteral can only work with template literals.

3 what about indirect eval? can you use fromLiteral with indirect eval? if so, then how do you know when to map those values into the internal slot? that is unclear to me. 4 To support indirect eval, you will have to map every string during parsing, and every template literal without interpolation, which will probably have some performance implication.

I dont fully understand what this bit is talking about enough to comment.

koto commented 10 months ago

From the ES Spec point of view, I always thought this was going to be a problem if we use the same mechanism that we use TemplateMap, and even if we use some other mechanism, I'm not very optimistic about it. Few notes:

  1. [[TemplateMap]] also include dynamic values (the array associate to it), which cannot be supported in TT.

https://w3c.github.io/trusted-types/dist/spec/#create-a-trusted-type-from-literal-algorithm step 3 should reject the templates with dynamic values (actually, with any interpolation).

  1. [[TemplateMap]] is used during parsing (via GetTemplateObject) when a template literal is encountered, but fromLiteral should support regular strings as well.

I'm not sure I understand. The intention is:

TrustedScript.fromLiteral`foo` // works
TrustedScript.fromLieral`foo${''}` // fails
TrustedScript.fromLiteral(..ANYTHING...) // fails

Is the current TT spec not written to that effect?

lukewarlow commented 10 months ago

For reference here's my prototype implementation of this based on https://tc39.es/proposal-array-is-template-object/ for the check templatedness aspect.

https://github.com/lukewarlow/WebKit/commit/f24c4837e66fd6cd205e77848ca65a2c152d0a9f

caridy commented 10 months ago

I was assuming that regular strings were going to be supported based on previous examples, e.g. : eval(TrustedTypes.fromLiteral("1")).

I don't know how I feel about this. I will socialize the idea with some folks, but we put a lot of efforts to make template literals to match the semantics of strings, and can be used interchangeable. As far as I can tell, so far we have no API that distinguishes between them, e.g. : foo === "foo". To be more specific, this seems very very odd:

eval(TrustedTypes.fromLiteral("1")); // throws or no-op
eval(TrustedTypes.fromLiteral(`1`)); // works

/cc @erights

Sora2455 commented 10 months ago

I was assuming that regular strings were going to be supported based on previous examples

The trouble as I understand it is that the code can't tell the difference between: eval(TrustedTypes.fromLiteral("1")) and eval(TrustedTypes.fromLiteral("1" + someRandomXssVector)), while it can tell the difference between

eval(TrustedTypes.fromLiteral(`1`))

and

eval(TrustedTypes.fromLiteral(`1` + someRandomXssVector))

(as the second example is no longer a template string, but just a plain string).

(If I've understood correctly).

koto commented 10 months ago

@caridy, I'll overcommunicate in case there are misunderstandings. There are 3 static functions (TrustedHTML.fromLiteral, TrustedScript.fromLiteral, TrustedScriptURL.fromLiteral), but for simplicity we can just say fromLiteral.

// your example
eval(fromLiteral("1")); 
eval(fromLiteral(`1`)); 

Both are the same, since you're calling fromLiteral as a regular function and its arguments are stringified before the function executes. fromLiteral would throw in both cases. The only way for fromLiteral not to throw is if it would be called using the fromLiteral`a_string` syntax, or:

function otherTag(ts) {
return ts
}
fromLiteral(otherTag`foo`)

The property we're after is really that a TemplateStringsArray was created from syntax, and not by manually constructing an array.

lukewarlow commented 10 months ago

As I understand it the code example below is the thing that got hung up on in tc39 preventing it (the proposal I mentioned above) moving to stage 3? This working is absolutely fine for trusted types and I imagine any other use cases for Array.isTemplateObject do we think presenting this as such would alleviate concerns in tc39?

The only way to spoof this would be using eval which csp and trusted types deals with?

function otherTag(ts) {
return ts
}
fromLiteral(otherTag`foo`)
koto commented 10 months ago

https://github.com/tc39/proposal-array-is-template-object/issues/10 has the most context, I think, but I believe they were a bit deeper than what we discuss here. The concerns might be alleviated if there is no Array.isTemplateObject function exposed, which is what we in the end did in fromLiteral, with the new approach using only the [[TemplateMap]] slot from ECMAscript in host algos without needing Array function.

cc @syg

mbrodesser-Igalia commented 10 months ago

To continue over-communicating. Supporting

function otherTag(ts) {
return ts
}
fromLiteral(otherTag`foo`)

seems unnecessary for web-developers and there's no benefit of using that instead of the shorter:

fromLiteral`foo`

However, supporting the former has no security impact.

koto commented 10 months ago

Supporting [tag smuggling] seems unneccessary [..] but has no security impact

Correct, that's simply the only way we could identify to support this using the existing spec wiring, so it's "good enough" for our case.

lukewarlow commented 10 months ago

I can see tag smuggling being useful if you've got your own tagged templates such as say lit htmls html tag.

You could just directly forward to fromLiteral and then fallback to sanitising etc and using policies if it's not a literal.

mbrodesser-Igalia commented 10 months ago

Postponing this issue to v2. In case of concerns, please raise them.

caridy commented 10 months ago

Again, at this point for me it is not really about whether or not it will work, it is about whether or not this is a good model. At TC39, we often look at features from the lens of developers, I like to use the ZPD (Zone of Proximal Development), and I can't see how this will be learned by developers, or how this new knowledge can be transferred to any feature in the future. It feels that we are bending backward or doing gymnastics to try to reuse existing features of the language for the propose of signing strings. What are the alternatives? What about new syntax? What about defer evaluation or module blocks? If the content to be evaluated must be static in order to be considered safe, how is that different from other features that we are exploring to do defer evaluation (proposed by Mozilla)? Just food for thoughts.

koto commented 10 months ago

That's very valuable context, thanks @caridy.

This all has to be weighted against the actual use case though. The use case is specific to the web platform, as DOM XSS, especially in existing websites is its unique problem. Fixing XSS by requiring to rewrite an application from scratch fails (case in point: unsafe-eval didn't eliminate eval's usage). Guarding dynamic code execution that is more precise than unsafe-eval and that doesn't require to throw away eval is a necessity today, and TT have a track record showing that this approach works for the existing websites - more than any other approach tried in the past.

Worth noting, fromLiteral seems to be both easy in implementation and solves a real problem for web platform, facilitating migrating to safe APIs. From what I understand, the argument isn't even about fromLiteral per se, but dynamic JS code evaluation hooks, no? Even without fromLiteral, TT support eval(TrustedScript) when that value is created through TT policies. To JS it shouldn't matter how a value to be executed was created, because it's the hosts concern to block or allow eval payloads. In that sense, fromLiteral is just an optimization over web platform's TT policy-based API.

shhnjk commented 10 months ago

Here is my opinion as a person who deployed Trusted Types outside of Google.

If anyone tries to deploy Trusted Types, they would need a library equivalent to fromLiteral. This is because there are so many constant string assignments to XSS sinks, that it would be much easier to create a library for it than to add those constant strings one-by-one to a custom TT policy.

However, that creates a problem because developer can bypass the library if they figure out how to bypass the check. This is not a problem for Google, because they have compile-time checks for those unsafe practices, but I'm sure most other companies don't.

This is why I belieave we should have fromLiteral in the Web Platform, because I don't think all sites that what to deploy Trusted Types has as many resources as Google to build compile-time checks for unsafe practices. Because if they had that much resources, they don't need Trusted Types (i.e. Google had Trusted Types equivalent internally before Trusted Types was available in the Web Platform).

caridy commented 10 months ago

@koto

This all has to be weighted against the actual use case though. The use case is specific to the web platform, as DOM XSS, especially in existing websites is its unique problem. Fixing XSS by requiring to rewrite an application from scratch fails (case in point: unsafe-eval didn't eliminate eval's usage).

I hope we can generalize this. I'm still trying to understand the different use-cases, but from what I know so far, a generalization of this seems possible that can be somehow lifted to the language with syntax, and useful for all runtimes.

Worth noting, fromLiteral seems to be both easy in implementation and solves a real problem for web platform, facilitating migrating to safe APIs. From what I understand, the argument isn't even about fromLiteral per se, but dynamic JS code evaluation hooks, no?

No doubt that in its current form, it is easy to implement, and it is going to be useful right away, that's not the concern. Having the right JS code evaluation hooks is the goal, yes, but my argument is that using template literals or tag functions, or anything that exists today in our syntax is the wrong approach, and will bite us back fairly quick. The fact that we are using the api + syntax as the signaling mechanism to provide the right hooks is concerning, specially because this is, as far as I can tell, the first time we are doing that, and you're not allowing all equivalent syntax, but a very particular syntax with a particular api.

lukewarlow commented 9 months ago

Worth noting, fromLiteral seems to be both easy in implementation and solves a real problem for web platform, facilitating migrating to safe APIs.

@koto btw to provide some clarity my "issue" with fromLiteral is purely around the implementation "difficulties", I personally believe it to be a valuable aspect of the spec and if not included in v1 should be a fast follow up.

I have a prototype for WebKit (https://github.com/webkit/webkit/compare/lukewarlow:from-literal) which at first glance appears to work but it's using a different implementation not based on TemplateMap. This is for 2 reasons efficiency (as specced this isn't an efficient check) and ease of implementation (the TemplateMap slot within JSC isn't necessarily easy to access within WebCore).

My implementation is largely based on https://tc39.es/proposal-array-is-template-object/ but without exposing the IsTemplateObject method itself.

Provided this implementation is identical to Chromium's from an end user code perspective (stuff such as cross realm checks) AND it's implementable in Gecko, I'd be comfortable keeping fromLiteral in the spec.

I've done an initial check (passing the tagged template parameter from an iframe through postMessage and it fails to work in both Chromium and WebKit, I'm unsure if there's other edge cases where this would be an observably different implementation.

@mbrodesser-Igalia can you speak to whether this approach will work in Gecko?

lukewarlow commented 9 months ago

To provide an update I've done some testing of cross-realm behaviour and it is indeed different to chromiums.

Specifically console.error('TrustedHTML.fromLiteral doesnt throw an error as expected'); logs an error in my WebKit build but NOT in chromium.

This is an observable difference, I'm not sure how bad the difference is for our use case.

One alternative idea was rather than doing array.[[IsTemplateObject]] do array.[[TemplateObjectOriginalRealm]] and do a realm check too. Array.isArray provides a same realm check but I'm not sure how that works in C++ land.

const iframe = document.createElement('iframe');
iframe.srcdoc = `
<script>
  window.templateArray = (x => x)\`\`;
</script>
`;
iframe.onload = () => {
    const { templateArray, TrustedHTML: TrustedHTMLFromRealm } = iframe.contentWindow;
    if (Array.isArray(templateArray) && !(templateArray instanceof Array)) {
        console.log('templateArray is an array from a different realm');
    } else {
        throw new Error('templateArray is not an array from a different realm');
    }

    try
    {
        TrustedHTML.fromLiteral(templateArray);
        console.error('TrustedHTML.fromLiteral doesnt throw an error as expected');
    } catch (e) {
        console.log('TrustedHTML.fromLiteral throws an error as expected');
    }

    try
    {
        TrustedHTMLFromRealm.fromLiteral(templateArray);
        console.log('TrustedHTMLFromRealm.fromLiteral doesnt throw an error as expected');
    } catch (e) {
        throw new Error('TrustedHTMLFromRealm.fromLiteral throws an error');
    }
}

document.body.appendChild(iframe);
mbrodesser-Igalia commented 9 months ago

Worth noting, fromLiteral seems to be both easy in implementation and solves a real problem for web platform, facilitating migrating to safe APIs.

@koto btw to provide some clarity my "issue" with fromLiteral is purely around the implementation "difficulties", I personally [...] @mbrodesser-Igalia can you speak to whether this approach will work in Gecko?

Don't know yet.

littledan commented 8 months ago

Should we be working on Array.isTemplateLiteral in TC39? It isn’t stalled due to any particular reason, just no one is pushing it forward. It was kinda blocking on interest from trusted types folks, which there seems to be.

erights commented 8 months ago

If they're interested, they should push it forward?

lukewarlow commented 8 months ago

@littledan I for one think it would be a good thing to try and progress.

littledan commented 3 months ago

Array.isTemplateLiteral did not move forward in TC39 due to concerns raised by Mozilla, as expressed by @dminor. I'm honestly neutral on whether we include the JavaScript-level API, and mostly wanted to unblock fromLiteral. I'd be interested to hear more about Mozilla's views here.

dminor commented 3 months ago

Array.isTemplateLiteral did not move forward in TC39 due to concerns raised by Mozilla, as expressed by @dminor. I'm honestly neutral on whether we include the JavaScript-level API, and mostly wanted to unblock fromLiteral. I'd be interested to hear more about Mozilla's views here.

Although we agree it may be useful to have an abstraction like Array.isTemplateLiteral in JavaScript, it's not clear to us if it is the correct abstraction to use or not, and it's not currently a priority for us to investigate further.

I think it would be best to not consider Array.isTemplateLiteral a blocker or prerequisite for fromLiteral, as there seem to be ways ahead without it.

lukewarlow commented 3 months ago

We'd need the underlying mechanics that isTemplateLiteral used. But as you say we don't need the actual exposed JS function for our use case. Would it be possible to add those without any exposed use for Trusted Types to build off of? Or would we need to do some degree of patching in this spec?