tc39 / proposal-extractors

Extractors for ECMAScript
http://tc39.es/proposal-extractors/
MIT License
200 stars 3 forks source link

This is a syntax for runtime types in JavaScript #20

Open eemeli opened 3 months ago

eemeli commented 3 months ago

It finally dawned on me what this proposal allows for. Consider this class, an evolution of one @rbuckton first mentions in https://github.com/tc39/proposal-extractors/issues/18#issuecomment-2040769594:

class Point {
  #x;
  #y;
  constructor(x, y) {
    this.#x = x;
    this.#y = y;
  }
  get x() { return this.#x; }
  get y() { return this.#y; }
  static [Symbol.customMatcher](subject) {
    return #x in subject ? [subject] : false;
  }
}

With that, I can write code like this:

function drawLine(Point(p1), Point(p2)) { … }

const Point(p) = getPoint(…);

const Point({ x, y }) = getPoint(…);

match (p) {
  when Point({ let x, let y }): …;
}

In other words, by returning a [subject] array from the custom matcher method, I'm effectively ignoring the "destructuring" part, and ending up with JS code that does runtime type checking. That's... really powerful.

And it makes me think that perhaps the code above could be clearer if it didn't include the parentheses:

function drawLine(Point p1, Point p2) { … }

const Point p = getPoint(…);

const Point { x, y } = getPoint(…);

match (p) {
  when Point { let x, let y }: …;
}

I would find that much more readable, and less confusing because it doesn't look like the value after the class name is an input.

We could also get a lot of the type-checking power of this by having a default matcher implementation that relied on an instanceof check (which is already customizable via @@hasInstance) returning [subject] on success. With that, the above code would also work even with this Point implementation:

class Point {
  constructor(x, y) {
    this.x = x;
    this.y = y;
  }
}
littledan commented 3 months ago

Yes, Asumu Takikawa made this point in committee in March 2023, here's a slide. I don't think this requires any removal of parentheses, though--the syntax is already very clear to me, and extending the JS grammar gets really complicated if we define Identifier Identifier to do something.

rbuckton commented 3 months ago

Extractors are not the correct feature to build runtime types on. While you can use extractors for runtime assertions, their purpose is to augment destructuring. IMO, a better feature upon which to build runtime types would be Decorators. For any runtime types implementation, I believe you need more than just input validation for them to be feasible, you also need introspection/reflection.

Extractors do not provide reflection. You cannot take a function f(Point(x, y)) {} function object and reflect that the first argument takes a Point, just as you also cannot take a const { p1: Point(x, y) } = obj to reflect over p1 being a Point.

Decorators do provide reflection via context.metadata and can intercept the outermost declaration (i.e., the parameter), but cannot be deeply nested in a binding pattern. Decorators and Extractors can overlap at that boundary, but Decorators face outward (describing information about the declaration to consumers), while Extractors face inward (breaking down inputs).

IMO, the following is a better basis for runtime types:

@Returns(Number)
function add(@Type(Number) x, @Type(Number) y) {
  returns x + y;
}

And it makes me think that perhaps the code above could be clearer if it didn't include the parentheses:

function drawLine(Point p1, Point p2) { … }

const Point p = getPoint(…);

const Point { x, y } = getPoint(…);

match (p) {
  when Point { let x, let y }: …;
}

The const Point { x, y } syntax was already proposed as part of this proposal, but has subsequently been rejected as it eats up too much syntax space in assignment patterns. For example, if foo{ bar } = obj were to be supported, we could never use identifier { for any other future syntax without an insanely complex cover grammar. This is immediately relevant because the following is already legal JS and would run afoul of an ambiguity:

class C {
  static {}
}
rbuckton commented 3 months ago

We could also get a lot of the type-checking power of this by having a default matcher implementation that relied on an instanceof check (which is already customizable via @@hasInstance) returning [subject] on success. With that, the above code would also work even with this Point implementation:

One of the reasons Pattern Matching uses Symbol.customMatcher and not Symbol.hasInstance is that hasInstance/instanceof doesn't work for primitives: 1 instanceof Number is false, while 1 is Number will be true. In addition, Symbol.hasInstance/instanceof does not work across realms, thus you must write Array.isArray(ar) instead of ar instanceof Array, while pattern matching will allow for ar is Array (using the new protocol).

eemeli commented 3 months ago

Extractors do not provide reflection. You cannot take a function f(Point(x, y)) {} function object and reflect that the first argument takes a Point, just as you also cannot take a const { p1: Point(x, y) } = obj to reflect over p1 being a Point.

True; they do the more JavaScripty thing of allowing for Point to define for itself what's valid, so the argument could be a plain object { x, y } or a tuple [x, y] or anything else that Point.p[Symbol.customMatcher] is ok with.

Outside the function, you're right that the power of this is significantly reduced, unless Point doesn't customize @@customMatcher and uses a default one that's doing an instanceof check.

The const Point { x, y } syntax was already proposed as part of this proposal, but has subsequently been rejected as it eats up too much syntax space in assignment patterns. For example, if foo{ bar } = obj were to be supported, we could never use identifier { for any other future syntax without an insanely complex cover grammar. This is immediately relevant because the following is already legal JS and would run afoul of an ambiguity:

class C {
  static {}
}

I'm not sure that I see the ambiguity with AssignmentPattern in the above, but the case made in #8 that foo[bar] = baz is already valid applies as well; I hadn't thought of that when writing my previous comment.

So something like const Point p could only work if it was limited to BindingPattern, i.e. something like:

ExtractorBindingPattern :
  ExtractorMemberExpression Identifier
  ExtractorMemberExpression ObjectBindingPattern
  ExtractorMemberExpression ArrayBindingPattern

And even then, parsing const foo[bar][baz] = ... would be a bit messy, even though it could have only one valid meaning.

To allow for custom matching behaviour during assignment, the parenthetical syntax is required, so either we use that everywhere, or we do the same sort of thing as with import vs import(), where the same kind of operation uses different syntax in different places.

One of the reasons Pattern Matching uses Symbol.customMatcher and not Symbol.hasInstance is that hasInstance/instanceof doesn't work for primitives: 1 instanceof Number is false, while 1 is Number will be true. In addition, Symbol.hasInstance/instanceof does not work across realms, thus you must write Array.isArray(ar) instead of ar instanceof Array, while pattern matching will allow for ar is Array (using the new protocol).

I agree that there is a need for @@customMatcher, and that String, Number & co. should include definitions of it that resolve as primitive values. I don't understand how @@hasInstance "does not work across realms", but that's perhaps not really relevant.

rbuckton commented 3 months ago

I don't understand how @@hasInstance "does not work across realms", but that's perhaps not really relevant.

More specifically, the default implementation of @@hasInstance/instanceof does not work for built-ins for object/array/RegExp literals when the value comes from another realm/frame.

So if you write

<script>
function checkEach(obj, ar, re) {
  console.log("Object:", obj instanceof Object);
  console.log("Array:", ar instanceof Array);
  console.log("RegExp:", re instanceof RegExp);
}
</script>

in the outermost window and call top.checkEach({}, [], /./) inside an iframe it prints the following:

Object: false
Array: false
RegExp: false

Thus, instanceof for built-ins is unreliable across realms/frames.

ljharb commented 3 months ago

Additionally, anything can use Symbol.hasInstance to lie, so it's generally not a good tool to reach for.

rbuckton commented 3 months ago

Additionally, anything can use Symbol.hasInstance to lie, so it's generally not a good tool to reach for.

I don't think that's relevant, the same thing applies to Symbol.customMatcher.

eemeli commented 3 months ago

Extractors are not the correct feature to build runtime types on. While you can use extractors for runtime assertions, their purpose is to augment destructuring. IMO, a better feature upon which to build runtime types would be Decorators. For any runtime types implementation, I believe you need more than just input validation for them to be feasible, you also need introspection/reflection.

Returning to the deeper point here, I think it's important to not only consider what extractors are meant for, but also what capabilities they offer beyond that initial purpose. In particular when used in code like

function drawLine(Point(p1), Point(p2)) { … }
const Point(p) = getPoint(…);

the extractors look like, and are, providing assertions about values satisfying Point[Symbol.customMatcher]. Sure, those assertions are really only facing inwards, and they're user-definable, and they're not really at all what types mean in other languages, but this honestly feels like a very JavaScripty way to do runtime typing. It also feels like a solution for maybe 80% of the use cases that one might have for runtime types, and that's good, because it's being reached with a comparatively small addition to the language.

As a library developer, I would absolutely use this syntax on public APIs to ensure that e.g. my functions are being called with the expected arguments. I probably wouldn't use it internally until build tooling is good enough to strip it out from places where it's unnecessary, but that won't take too long. This one change would probably subsume a decent chunk of what I'm using TS for, actually.