mikewest / tc39-proposal-literals

Literals could be different than non-literals.
32 stars 4 forks source link

Suggestion for details: Add this as a feature of templates #2

Open littledan opened 6 years ago

littledan commented 6 years ago

Would it be a problem to ask library users to use template string syntax, rather than literal strings? Could WebIDL expose template tags? If so, maybe this would be easier to specify and implement as a property of templates, rather than of strings.

Here's the API surface I'm imagining: the first argument to a template tag would have an additional property, literal, which indicates whether the string provided as an argument to the template was a literal. We could make this unforgeable by representing this as an internal slot, with literal as an own getter, so you could do something like this to prevent an attacker from calling your template with any old object that has a literal: true property:

// Un-monkey-patchable way to get the getter
let getLiteral = Object.getOwnPropertyDescriptor((_ => _)``, "literal").get; 

function literalString(strings, ...keys) {
  if (!getLiteral.call(strings)) throw new Error();
  return String.raw(strings, ...keys);
}

This literalString template tag is like String.raw but would throw an exception if you don't pass a literal. It outputs a normal string, with no trace any more that it was literal. This two-liner could be the recommended way to set off a call to something that requires a literal string. It's not possible to compromise the string literal contents because the strings object (and its inner raw object) are frozen.

mikewest commented 6 years ago

Hey Daniel! This seems like a reasonable narrowing of the scope in general, and a better explanation of how the thing might work in practice than I had in https://github.com/mikewest/tc39-proposal-literals#strict-tag-functions. :)

That said, I'd very much like to expose the "literal"-ness of the string in some way to the platform so that DOM could implement some built-in template tags that required literal strings. That is, one goal is to define some sort of fromLiteral factor for TrustedUrl objects, such that TrustedUrl.fromLiteral`This is a literal string`; could be implemented as part of DOM, with the checks enforced in C++. I assume that if we could access the "literal" property from JS, we'd be able to do so from C++ as well, though I admit I'm not familiar enough with that bit of Blink's bindings to know off the top of my head how we'd do so.

/cc @koto, @slekies

littledan commented 6 years ago

I believe the literal-ness should be just as visible to the platform as it would be to JavaScript, but it might take some more work in WebIDL and the Blink bindings layer to do the plumbing.

mikewest commented 6 years ago

I pulled this suggestion into the document wholesale at https://github.com/mikewest/tc39-proposal-literals#strict-tag-functions. Thank you!

domenic commented 6 years ago

I like this variant a lot, FWIW.

allenwb commented 6 years ago

So:

'xy' === 'x'+'y'; //'true
'xy'.literal; //true
('x'+'y').literal ;  //false

That's highly disturbing, given the usual semantics of ===

Also consider:

let str=eval(`'${"a"+"b"}'`); //equivalent to eval("'ab'");
str.literal;  //presumably true, because the parser sees the litera 'ab'

So, a string value is primitively tagged as "a literal" even though it may have been arbitrarily constructed at runtime. I appreciate that @mikewest is thinking of use cases where eval has probably been disabled. But a semantics of a fundamental ES type (ie string) should be meaningful in all contexts.

littledan commented 6 years ago

@allenwb In what I'm proposing, those would always be undefined--the literal property is just on the first argument for tagged templates.

allenwb commented 6 years ago

@littledan OK, but the eval observation still stands.

Also, I guess I don't see how the literalString tag (as shown above) actually helps. First of all, by definition a lexical grammar Template is a literal. So, you literalString function will never throw when used as a tag function of a TemplateLiteral, even if substitutions are provided. At the very least you should throw if any substitution values are provided.

If I understand your suggestion, you want libraries (eg, the web platform) to "ask" that calls to certain use literalString to condition string values passed to the API. EG:

   someAPICall(literalSring`first arg`, literalString`second arg`);

A library could ask, about how would it enforce that a client actually does it? I guess, the library would have to provide a tag that both internally did the literal test and then returned a privately branded wrapper object and the API call would then reject any non-wrapped values. EG:

   someAPICall(someAPILiteralStringBrander`first arg`, someAPILiteralStringBrander`second arg`);

If that's you intent you probably need to clarify it.

mikewest commented 6 years ago

If I understand your suggestion, you want libraries (eg, the web platform) to "ask" that calls to certain use literalString to condition string values passed to the API.

If we ran with @littledan's suggestion, then we'd forget all about the first sketchy proposal in the doc. In that case, I would like libraries (e.g. the web platform) to be able to define template tags that required the first argument to be a literal. Coming back to the TrustedUrl example in the proposal, we'd have some new WebIDL construct that defined a tag function (let's call it tag, for grins):

interface TrustedUrl {
  // Stuff goes here.

  static tag fromLiteral(DOMString url);
};

Developers would poke at it as described:

// This returns a `TrustedUrl` object:
var url = TrustedUrl.fromLiteral`https://an.example/`;

// This throws a `TypeError`:
var url = TrustedUrl.fromLiteral(inputElement.value);

This might be what you meant by "I guess, the library would have to provide a tag that both internally did the literal test and then returned a privately branded wrapper object and the API call would then reject any non-wrapped values."? If so, then yes! Exactly that! :)

(Forgive me for focusing on the library side of things: it's all I know.)

littledan commented 6 years ago

@allenwb The idea is that you could use this literal getter to check whether it was a real template literal or whether it was an object that a user made to look like one (and simply called the template tag as a function). Maybe we should make literal return false if you provided any substitution. For the eval concern, I imagine that CSP (i.e., HostEnsureCanCompileStrings returning false) may be able to block that eval.

allenwb commented 6 years ago

@mikewest What I meant is the the value return as url would have to have been given a privately access brand that says it was generated using literalString.

Also while

// This throws a `TypeError`:
var url = TrustedUrl.fromLiteral(inputElement.value);```

would indeed throw, ~I could write a function (using techniques similar to what Dan showed for literalString) that could be used to by-pass that check:~

~// This won't throws a TypeError:~ ~var url = TrustedUrl.fromLiteral(masqueradeAsLiteral( inputElement.value));~

~If it can be bypassed that simply it's not really security but more on the order of training wheels. Is that good enough? ~

see https://github.com/mikewest/tc39-proposal-literals/issues/2#issuecomment-345755257

domenic commented 6 years ago

I might not be fully understanding what's going on here, but I do think there's at least some confusion where people are not properly separating the internal slots level and the JS API level. Web platform APIs like TrustedURL.fromLiteral would consult unforgeable internal slots. There's no need for them to be written as Dan has done.

As usual, we can have a getter exposing that, but it'll be forgeable without safe-meta-programming. Conversely, you should not be able to write a masqueradeAsLiteral function that somehow fools the internal slot check.

littledan commented 6 years ago

@domenic Yes, that's what I meant. I don't see how @allenwb 's examples would evade things.

@allenwb I don't think strings need any more complexity. The technique I was showing wasn't a bypass technique, but rather a technique to have a reliable internal slot checker to be used against the template object.

allenwb commented 6 years ago

Ok, I agree that it would be possible in GetTemplateObject to brand all template objects by giving them an internal branding slot. We'd also have to expose a brand check function (Array.isTemplate since template objects are arrays??).

That should be sufficient for library code to unforgably test whether an object is a "real" template object created by evaluating a TemplateLiteral.

littledan commented 6 years ago

Why would you need the second function? The own getter which checks for a literal brand would be insufficient?

allenwb commented 6 years ago

It's mostly a difference in ugliness. Compare (all other things being equivalent):

let getLiteral = Object.getOwnPropertyDescriptor((_ => _)``, "literal").get; 

function literalString(strings, ...keys) {
  if (!getLiteral.call(strings)) throw new Error();
  return String.raw(strings, ...keys);
}

to

function literalString(strings, ...keys) {
  if (!Array.isTemplateLiteral(strings)) throw new Error();
  return String.raw(strings, ...keys);
}

Why define something in a way that demands use of such an ugly, error prone idiom?

mikewest commented 6 years ago

I agree that Array.isTemplateLiteral is prettier. As long as it's also unforgeable, that sounds like a fine approach to me.

littledan commented 6 years ago

Oh, I see. I wasn't really designing for prettiest, just something simple that would work reliably and be easy to implement. However, it comes with the major pifall that you could accidentally use the literal getter directly, even as that is forgable.

Unfortunately, Array.isTemplateLiteral is forgable by either replacing that function with another one or by replacing Array on the global object.

domenic commented 6 years ago

Safe metaprogramming is never going to be pretty. Everything in this thread has so far been susceptible to forgery, sometimes by replacing Function.prototype.call. We shouldn't worry about the prettiness when trying to be unforgeable. The platform will just brand-check, and people working in a hostile environment that need safe metaprogramming will write ugly code. People working in a non-hostile environment should get a reasonable syntax, so I prefer the getter for the surface syntax.

allenwb commented 6 years ago

Array.isTemplateLiteral is no more or less forgabe then any other built-in. Early capture is your friend.

It's also easy to validate if you are using a good Array.isTemplateLiteral:

const isLit = Array.isTemplateLiteral;
const nonlit = [""]; nonlit.raw=Object.freeze([""]);
Object.freeze(nonlit);

if (!isLit`` || isLit(nonLit)) throw Error("bogus Array.isTemplateLiteral");

or you could provide a unforgable platform specific version of Array.isTemplateLiteral

or the isTemplateLiteral function could be made available using an built in module

The problem with the original suggestion of a literal getter on template objects is that it is an attractive nuisance for usage that is susceptible to forged templates:

const maybeLit= {get literal() {return true};
if (maybeLit.literal) {
   //ok we "know it  is  a template literal  (or we...)
littledan commented 6 years ago

@domenic Oh, good point. Well, that could be addressed by instead having the literal function be an own data property, and have the function take a template as an argument rather than receiver. Then you avoid Object.getOwnPropertyDescriptor and Function.prototype.call. Maybe this would also help with the attractive nuisance problem which @allenwb and I mentioned above a little bit.

allenwb commented 6 years ago

As previous mentioned, this whole style of testing is only valid if eval is not available. Any platform that is able to unilaterally decide that eval is unavailable should be able to also make Array.isTemplateLateral unforgable.

domenic commented 6 years ago

As I tried to explain above, I'm not worries about attractive nuisances. If you're looking for good enough, use .literal (getter is fine, no need for data property). If you're looking for safe meta-programming, save functions and write ugly code. (You can save an "uncurry-this"ed version of the getter.) If you're the platform, use internal slots.

As such I think we should use a surface syntax which reflects the programming model. It's a property of the template object whether or not it's from a literal. Thus a getter on the template object seems best, instead of some Array-wide predicate.

koto commented 6 years ago

Putting the forgeability and eval issues aside, how big of a problem would it be to be able to check for "literalness" of the standard, raw strings, not just string literals?

For example, assuming Array.isTemplateLiteral is not forged, can we have:

let lit = 'string'
Array.isTemplateLiteral([lit]) // true
let notlit = 'string' + Math.random()
Array.isTemplateLiteral([notlit]) // false

The reason I'm asking is that one important use case fro literalness is to be able to distinguish the provenance of a String in web applications. If the string was created as a literal, it's a strong signal that the string is wholly author-controlled - if it's not, it might have been constructed of user-controlled source, which may cause a DOM XSS vulnerability, when used with some DOM XSS sink functions/properties like Element.innerHTML.

Beign able to "trust" raw string literals (not template literals only) allows us to create a more secure DOM API, while at the same time significantly reduce its adoption costs, as most of the existing calls already use (presumed safe) literals. So, for example, we could have:

anEl.innerHTML = 'https://an.example/'; // does not throw
enEl.innerHTML = inputElement.value; // throws

which does not require rewriting all of the codebase.