tc39 / proposal-pattern-matching

Pattern matching syntax for ECMAScript
https://tc39.es/proposal-pattern-matching/
MIT License
5.45k stars 89 forks source link

Splitting Variable/Extractor matcher hook? #303

Closed tabatkins closed 6 months ago

tabatkins commented 11 months ago

Right now, we're using a single Symbol.customMatcher method to handle both Variable Patterns Foo and Extractor Patterns Foo(...). The former can just return true/false, but the latter needs returns either an iterable or false. (And true/iterable both do the obvious thing when used in the opposite context.)

Matthew brought up in the champion meeting today that he was somewhat concerned about the perf impact of this in variable pattern cases - String just checks if your value is a string (or a String), while String(...) will run the pattern on the (primitive, unwrapped if necessary) string, so the String[Symbol.customMatcher] method has to be defined to essentially be:

String[Symbol.customMatcher] = function(subject) {
    if( typeof subject == "string" ) return [subject];
    if( subject instanceof String ) return [String(subject)];
    return false;
}

And incurring the cost of a temp array for the likely much more common String variable matcher didn't seem very great. We resolved in the meeting to make sure that we spec things such that, when used in the pattern, the built-ins' return values are unobservable, so the temp array can be elided entirely by codegen. (You'd only see it if you invoked the method directly.)

But this doesn't help authors, who have to write the less efficient code all the time. Maybe we can solve this more directly, by splitting the hooks into two:

We check for both hooks on an object, in both cases, but check for the appropriate one first. That is, Foo will invoke customMatcher if it exists; if it doesn't, but customExtractor does, it invokes that (and treats an iterable as a true result); if neither exists, it's just equality-checked, as normal for variables. Similarly, Foo(...) will invoke customExtractor if it exists; if it doesn't, but customMatcher does, it invokes that (and treats true as an empty iterable); if neither exists it's a TypeError.

So that would mean that String would be instead defined as:

String[Symbol.customMatcher] = function(subject) {
    return typeof subject == "string" || subject instanceof String;
}
String[Symbol.customExtractor] = function(subject) {
    if( typeof subject == "string" ) return [subject];
    if( subject instanceof String ) return [String(subject)];
    return false;
}

This lets us avoid having to be clever in codegen for built-ins and also lets authors receive the same benefits - Option.Some can just return a bool. In more complex cases where returning the values for an extractor might be expensive, you can avoid quite a bit of work when not necessary.


For the "auto-create a custom matcher if you didn't provide one in the class block" behavior, I presume we'd only auto-generate the customMatcher one; we can't assume anything meaningful for extractors by default. But if you only provide a customExtractor, should we still auto-define the customMatcher, or just defer to the author's customExtractor? I'm leaning toward no - if the author defines either, we don't generate anything, because they know what they're doing.

Jack-Works commented 11 months ago

if Function.prototype.customMatcher is frozen, we will have a new fail point for override mistakes!

I support split it into two entry points.

ljharb commented 11 months ago

This seems like wildly overcomplicating the proposal; iterating over an unmodified array should be very cheap with current optimizations, and we create extra throwaway containers constantly (including in the entire iterator protocol).

We shouldn't be concerned with potential performance issues, I think, until an implementor has concretely indicated such a concern.

I think one protocol hook point is more than sufficient, and I suspect if we tried to introduce two we'd get a lot of pushback. I'm also not sure why we'd freeze anything.

tabatkins commented 11 months ago

As was noted in the chatroom, the concern I'm quoting does come from a (Firefox) implementor (and Shu indicated a very similar concern in #253, so that's two browsers). Outside of the built-ins, I can also very easily imagine that constructing the iterator contents (beyond the iterator itself) might be a relatively expensive operation for user-defined extractors, and something they might want to avoid when users are just invoking Foo to get a typecheck, which I imagine will be very common.

ljharb commented 11 months ago

I'm sure there's many other API designs that don't require creating a second protocol, which is an exceedingly costly thing to do from a language surface area standpoint.

tabatkins commented 11 months ago

I mean, if you would like to suggest one, feel free. The best I could imagine would be to have matchers return a function on success, which is invoked to return the iterable. But that's somewhat awkward (and still costs an arrow function).

ljharb commented 11 months ago

I will certainly do so after I'm done traveling post-TC39.

tabatkins commented 11 months ago

Another option, instead of two hooks, is to pass a context argument to the function as a second argument. Probably an options bag, to allow for future expansion. Currently it would just tell you whether you're being invoked with or without an arglist, so you can decide between doing a cheap boolean test or constructing an iterator.

Like:

String[Symbol.customMatcher] = function(subject, {matchType}) {
    return match(matchType) {
        when "no-args" and if(typeof subject == "string" || subject instanceof String): true;
        when "with-args" and if(typeof subject == "string"): [subject];
        when "with-args" and if(subject instanceof String): [String(subject)];
        default: false;
    };
}

Functions that don't care can just not pay attention to the argument; we'll still do the true <-> iterator conversion as necessary.

ljharb commented 11 months ago

That seems much simpler to me than a second symbol.