Open ethanresnick opened 2 years ago
Related: #41114
which would be the supertype of all strings that are declared literally in the program's source text (or derived from such strings, e.g. by concatenating two of them)
From a performance perspective, this seems like a nightmare, and possibly even undecidable given how TS's type system works. Is there a workable formal definition of this?
@RyanCavanaugh For the performance concerns, I don't know the TS implementation well enough to understand the issue. Can you elaborate a bit? The linked PEP did seem to imply that this ended up being simple to implement for Python, which gives me some hope; but, of course, that might not translate to TS for a million reasons.
As far as a definition goes, what type of "formality" did you have in mind? I'm not sure what would be helpful, but I'll try to give some examples...
This is the simplest case: x
is inferred as a literal type today, so it's assignable to LiteralString
:
const x = "hello";
let y: LiteralString = x;
Beyond that, the most common way of combining literal strings is probably with +
(including for multi-line strings), so I think an assignment like the below would need to work:
const x = "hello" + " world";
let y: LiteralString = x;
If the type of x
above could be inferred as "hello world"
rather than string
, then this collapses into the first example. Changing this intrinsic behavior of +
seems like it could be a useful independent change, but it'd be critical here.
Concatenating w/ template strings would ideally be supported too:
const x = "hello";
const y = `${x} world`;
let z: LiteralString = y;
I think this builds pretty straightforwardly on the above. It also seems like the constant expression machinery for enums might be applicable.
In addition to literal types being assignable to LiteralString
, existing LiteralString
s can be concatenated with each other, with the result being a LiteralString
.
let a: LiteralString = "SELECT * from foo";
if(applyLimit) {
a += " LIMIT 1"; // assignment should succeed.
}
More generally, if typeof s
is LiteralString
, the expressions s + 'xyz'
and `${s}xyz`
could be typed as just LiteralString
, or TS could try to preserve more info by producing the type `${LiteralString}xyz`
.
I think this gets tricky with widening. I.e., in let x = "hello"
, is x
typed as a string
or LiteralString
? My understanding is that, TS cannot easily support something like the below:
declare const a: string;
let query = "SELECT * from foo";
await executeQuery(query); // ok. typeof query = LiteralString
query += a; // typeof query silently changes to string upon concatenating the non-literal string `a`
await executeQuery(query) // this now fails
Assuming the above can't be supported, I think we'd have to keep the current behavior where let x = "hello
types x
as simply string
. People who want to imperatively build up a LiteralString
from pieces would have to explicitly use an annotation:
let query: LiteralString = "....";
Finally, for LiteralString
to be useful more generally, there'd need to be overloads for some of the built-in string functions to preserve LiteralString
-ness. (The Python proposal has a list of these.)
// again, an explicit annotation's probably needed here, or `as const`;
// otherwise, `conditions` inferred as just `string[]`.
const conditions: LiteralString[] = [
"status = 'published'",
"created_at > '2022-01-01'",
"author_id = ?"
];
await query(`SELECT * from posts WHERE ${conditions.join(' AND ')}`);
So, there's an implicit overload on join
(assuming query
only accepts LiteralString
):
interface Array<T> {
join(separator?: LiteralString): T extends LiteralString ? LiteralString : string;
}
The basic idea would be that any deterministic operation involving only LiteralString
s should be thought of as producing a LiteralString
.
For some of these overloads — especially of methods that live on strings — I'm not sure if TS supports a good place to put them. E.g., how would we specify that calling toUpperCase()
on a LiteralString
produces a LiteralString
?
That said, I think defining LiteralString
overloads for the built-in methods is the least important part of this proposal. Many times — maybe the majority? — the final literal string will just be written inline, without the user building it up from other literal strings. E.g., you'll just be doing: query("SELECT .... WHERE x = ?", [paramValue1])
. For the remaining times when a LiteralString
is built up from sub-components, my guess is that the +
and join
together cover many of the cases. If there are occasional remaining cases where an overload can't easily be provided, then a cast isn't the worst thing — e.g., myLiteralString.split("\n") as LiteralString[]
.
Ah, I was taking this much more literally (ha!) that LiteralString
would actually be a union of all the literally-written strings in the program.
In terms of TypeScript relative to Python, I think there'd be a very difficult cognitive leap at the point where the runtime behavior crosses into the type system behavior. I believe with the definitions given, this program is supposed to have an error, but it seems like a hard sell:
function foo(x: "bar") {
fn(x);
}
function fn(x: LiteralString) {
}
@RyanCavanaugh Now I’m confused haha. Why would the example code you showed have an error? The type of x
is "foo"
, which is a literal type, so it would be assignable to LiteralString
just fine (when calling fn
). I’m also not following your comment about the runtime and type system interaction; in TS, this would be a purely compile-time check, which is how it works in Python too (and Scala iiuc).
Ah, I was taking this much more literally (ha!) that LiteralString would actually be a union of all the literally-written strings in the program.
Totally my fault! I see how the original text implied that. I’ve updated the OP to hopefully make it much clearer what I’m actually proposing
Why would the example code [Ryan] showed have an error?
I think the implication was that despite "bar"
being a literal type, the specific value of x
at runtime is not guaranteed to originate in source code. Of course with a single literal type that doesn’t make any sense, but the problem becomes much clearer if you imagine the type involved is "foo" | "bar" | "baz"
. It seemed like your intent was that only hard-coded strings are assignable to the proposed type, the string doesn’t ever get to be chosen out-of-band (e.g. by the caller of a function).
For now I found a shitty-workaround for this:
const fn = <const S extends string>(str: string extends S ? never : S) => {}
fn("test") // passes
fn("test" as string) // fails
🔍 Search Terms
literal string, xss, sql injection, security, user input handling
✅ Viability Checklist
My suggestion meets these guidelines:
⭐ Suggestion + Motivating Example
The idea is to add a built-in type called
LiteralString
, which would be the supertype of all literal string types. Ie,LiteralString
is inhabited by all the subtypes ofstring
, excludingstring
itself and template string types that containstring
. In addition to introducing this type, TS would be more careful about tracking whether a string has a literal type (eg, when two strings with literal types are concatenated with+
, the result would remain a literal type, rather than becomingstring
).The motivation here is to allow the type system to check that certain security-sensitive strings haven't been unsafely manipulated by user-controlled input. For example, one could write a function like
queryDb(query: LiteralString, params?: unknown[]): Promise<Results>
to enforce that the query string does not have any values interpolated into it that could've been user-controlled and created SQL injection vulnerabilities. The idea is that the value from user input would’ve had to be typed asstring
, which can’t be mixed into aLiteralString
without producing astring
, which would then not be an acceptable input toqueryDb
:There is a bunch of prior art for such a type, with identical motivation, including the LiteralString type in Python. There was also a proposal to have JS engines track whether a string was created entirely from literals, which would've been used to allow DOM APIs like
innerHTML
to treat literal strings as safe, as part of a broader strategy to protect against XSS. (Of course, this TS proposal is compile-time only, but the motivation is the same.) Additionally, there was/is an analogous type in Google's Closure Compiler, with the same motivation. Finally, Scala has an analogous type,Singleton
, which is inhabited by all literal types.Potentially, the built-in type could be called
Literal
, rather thanLiteralString
, and could also include other kinds of literals (numbers, bigints, etc); APIs which need a string would then doLiteral & string
, or TS could provideLiteralString
as a built-in alias.I guess there's an argument that tracking all literal values in the same way, and having a unified
Literal
type, is more elegant, and perhaps there are some use cases outside of security for which such a type would be valuable. For the security use case, though, if an API takes a non-string, and you pass user input to that API (or some value derived from user input), it seems almost certain that you intended to let the user control the API with their input. In these non-string cases, there's nothing analogous to the "you intended to allow the user to provide some data, but they tricked the system into interpreting that data as code" problem that's at the heart of SQL injection, XSS, and related vulnerabilities.Given all that, I guess I'd propose starting with only
LiteralString
, as that's presumably less effort to implement and adds less overhead to compile times. If legitimate use cases for a more generalLiteral
type arise, then it's easy to implement that later and redefineLiteralString
asLiteral & string
.