tc39 / proposal-pattern-matching

Pattern matching syntax for ECMAScript
https://tc39.es/proposal-pattern-matching/
MIT License
5.51k stars 89 forks source link

Use scala-style extractors for extensible, nestable patterns #63

Closed josephjunker closed 6 years ago

josephjunker commented 6 years ago

I would like to propose support for Scala-style extractors. What follows is background on why I believe this support is important, and a sketch of how this support could be accomplished. This proposal is heavily, heavily based on the paper Matching Objects with Patterns. The explanation here is long and possibly pedantic, because I want to introduce the background which I find makes extractors an intuitive solution to pattern matching in a hybrid language such as JavaScript.

As a TLDR, I believe that support for extractors, nested patterns, and a clean destructuring syntax can be added as an extension to the current proposal, while maintaining the current proposal's features, and that these extensions would compose cleanly with all of the features provided in the current proposal. Doing so would increase the power and expressivity of JS patterns, making them more powerful than patterns in most functional languages, and would solve certain pain points which exist in the current proposal around both destructuring and nested patterns. If you want to skip past the background, the concrete changes I'm proposing are in the section labelled "Concrete Proposal" at the very end, and the two sections before that show examples of what pattern matching would look like if extractors were adopted.

Background - pattern matching in functional languages

Pattern matching has its roots in ML-family languages, in which data is represented via algebraic data types. The atomic form of data in these languages is the "constructor", which is a reversible function that holds other pieces of data and attaches a distinct label to them. These data types are called "algebraic" because individual data types can be composed to create new data types via sums and products. When a constructor takes multiple pieces of data, it represents a product of these two data types. A data type may also be expressed as a sum, which is a choice between one of several data types.

When data is expressed in this way, consuming data requires unwrapping and examining the data's nested constructors and their contents. When the input data is a sum type, a case must be specified for each member of that sum type. These factors are the motivation behind pattern matching in the vast majority of languages which support it today, and have lead to the following characteristics being common to the pattern matching syntaxes of most functional languages:

1) Patterns are expressed in terms of constructors. 2) Patterns are recursively nestable to an arbitrary depth. 3) Patterns declaritively express the data that they match; the structure of a pattern mirrors the structure of the mapped data 4) Patterns compose, and pattern composition mirrors data composition. If I have a parametric type Foo<A> and a concrete type Bar, I can compose these types to create the concrete type Foo<Bar> by placing the Bar inside of the Foo. If I have a pattern which matches a Foo type, and a pattern which matches a Bar type, I can compose these patterns into a pattern which matches a Foo<Bar> type by placing the Bar pattern into the Foo pattern. This is a natural extension of characteristic #3. 5) Patterns bind names to values inside of constructors by placing those names inside the constructors in the patterns, at the location where the bound data will be. This is also a natural extension of characteristic #3.

Currently, most of these characteristics do not hold for this pattern matching proposal. This is largely exacerbated by an impedence mismatch between JS and the ML-family languages that pattern matching comes from-- In JS, classes and object literals are the fundamental building blocks of data, not reversible constructors. Scala has a similar impedence mismatch, and has addressed it using extractors, which allow for support for ADT-style pattern matching in an object oriented language, while providing the benefits of both the functional and OO styles.

Extractors

ADTs exist in JS, although they don't have explicit syntax, and must be expressed via classes and programmer conventions. (This is shown in the example section below.) They're useful, and 90% of the code I write is JavaScript using ADTs as a data representation. However, a good pattern matching solution should not be restricted to a particular style of functional JS. For the way most JS is written, there are four main differences between JS data structures and ADTs:

1) ADT constructors are reversible; a JS object created via a class constructor does not provide a way to recover the original constructor arguments 2) ADTs are immutable, while JS objects mutate frequently. This makes recovering the original constructor arguments even more impossible. 3) ADT constructors store their arguments linearly; JS objects store their values in named fields 4) ADTs are necessarily created via constructors; JS objects may be created via object literals.

Scala has these differences as well, and uses a technique called "extractors" to bridge the gap. I believe this technique is just as applicable to JS as it is to Scala.

With ADTs, matching a piece of data to a clause inside a pattern matching statement can be seen as having several distinct components:

1) data being matched comes into a matching clause 2) the outer constructor of the data is compared to the outer constructor in the pattern. If they mismatch, the match fails. 3) the constructor is de-constructed, and a linear list of arguments to that constructor is pulled out 4) these arguments are paired up with the sub-patterns of the pattern being matched, and matching proceeds recursively

With extractors, we combine steps 2 and 3 using an extractor function. This is a function which, given an object, tests whether the object matches the type of data that the extractor is expecting, and returns null if there is a mismatch. (This is the same as the role of the function defined at Symbol.matches in the current proposal.) If there is not a mismatch, though, an extractor returns an array of values, rather than just true. This array is a simulation of the arguments to the type being matched; they may be the arguments that were passed to the object's constructor, and were recovered from the object's structure, or they may be other values which represent the structure of the object. In either case, an extractor takes a piece of data and gives us a list of values, which is what a reversible constructor does in functional languages, and is what is needed to achieve the characteristics of pattern matching which are listed in the first section.

Extractors encompass and surpass reversible constructors in their expressivity. For instance, if I implement an immutable class Foo which takes three arguments, and an extractor for that class, I can do something like this:

class Foo {
  constructor(a, b, c) {
    this.a = a;
    this.b = b;
    this.c = c;
  }
}

function extractFoo(foo) {
  return foo instanceof Foo ?
    [foo.a, foo.b, foo.c] :
    null;
}

extractFoo(new Foo(1, 2, 3)); // [1, 2, 3]

Here I've reached parity with reversible constructors; my extractor allows me to recover the original arguments to the Foo constructor, as long as I follow the convention of not mutating my Foo instances. This shows that extractors are at least as powerful as reversible constructors.

Extractors support much more than this, though. Here is a hypothetical extractor for strings, which could be provided as a method on a RegExp object:

extract(str) {
  return this.match(str);
}

This would allow for a regex to be treated as though it were a constructor for strings inside of a pattern matching statement. Sub-patterns which perform tests on values or bind values to variables could be passed to this "constructor", allowing easy binding and tests on the results of a pattern match on the left-hand side of a pattern matching statement. This is shown in the examples section.

Motivating example: Nested pattern matching in JS currently, and in this proposal

Here's an example of writing something in the ADT style, in ES6 as it stands now. First, I declare the data types/constructors that I'll be using:

class Apple {
  constructor(cultivar) {
    this.cultivar = cultivar; // cultivar is a string
  }
};

class Orange {
  constructor(cultivar) {
    this.cultivar = cultivar;
  }
};

class TooSmall {
  constructor(weight) {
    this.weight = weight; // weight is a number
  }
}

class Blemished {}

I will be using Apple and Orange as a "Fruit" sum type, which would be expressed in Flow like type Fruit = Apple | Orange. Likewise, TooSmall and Blemished will be used as a "Flaw" sum type, expressed like type Flaw = TooSmall | Blemished.

Next I'll define an Either type, a common error-handling mechanism in functional languages. An Either is a sum type, which is either a Left holding one type or a Right holding another type:

class Left {
  constructor(value) {
    this.value = value;
  }
}

class Right {
  constructor(value) {
    this.value = value;
  }
}

In Flow terms, this would be expressed as type Either<A, B> = Left<A> | Right<B>.

The Either type functions as a discriminated union. By convention, Left signifies an error, and Right signifies a success. I can compose these classes to produce data type which describes the results of trying to obtain a fruit from some fruit-retrieval service. This service will either return a Right containing a fruit (which is an Apple or an Orange) or a Left containing the reason why the requested fruit was rejected (due to it being TooSmall or having a Blemish.)

In the current state of JS, this is how I would write a function to consume this type, and return a human-readable string describing the fruit which was retrieved:

function describeFruit(e) { // e has type Either<Flaw, Fruit>
  return e instanceof Left && e.value instanceof TooSmall ? `Could not retrieve fruit; it only weighed ${e.value.weight} ounces`
  :      e instanceof Left && e.value instanceof Blemish  ? `Could not retrieve fruit; it was damaged.`
  :      e instanceof Right && e.value instanceof Apple   ? `Retrieved a ${e.value.cultivar} apple.`
  :      e instanceof Right && e.value instanceof Orange  ? `Retrieved a ${e.value.cultivar} orange.`
  :      /* no such match */ tilt();
}

function tilt() {
  throw new Error("Non-exhaustive pattern match");
}

This can be seen as a simulation of pattern matching. I'm comparing the constructors of data, and pulling out the pieces of data which were passed to those constructors.

In this case, the current proposal provides a slight amount of syntax sugar for this approach:

function describeFruit(e) { // e has type Either<Flaw, Fruit>
  match (e) {
    Left if e.value instanceof TooSmall : `Could not retrieve fruit; it only weighed ${e.value.weight} ounces`,
    Left if e.value instanceof Blemish  : `Could not retrieve fruit; it was damaged.`,
    Right if e.value instanceof Apple   : `Retrieved a ${e.value.cultivar} apple.`,
    Right if e.value instanceof Orange  : `Retrieved a ${e.value.cultivar} orange.`
  }
}

This gives some improvement, but it largely looks the same. We've gotten rid of the first instanceof check on each line, which is good, but we've added an if, and the overall structure is still verbose. Both of these solutions lack the characteristics of pattern matching in functional languages which I listed in the first section. If I were to try to specify a match that was three levels deep in either of these styles, or a match on a constructor which contained two or three values, the resulting code would be unworkably verbose.

By comparison, in a functional language with pattern matching, the code would probably look something like this:

match (e) {
  Left(TooSmall(weight))  : `Could not retrieve fruit; it only weighed ${weight} ounces`,
  Left(Blemish)           : `Could not retrieve fruit; it was damaged.`,
  Right(Apple(cultivar))  : `Retrieved a ${cultivar} apple.`,
  Right(Orange(cultivar)) : `Retrieved a ${cultivar} orange.`
}

I think this is much more comprehensible, for several reasons. One is that the first and second matches of each case are expressed in the same way; Left and TooSmall are both constructors, and so they both have the same syntax, instead of one needing if and instanceof while the other doesn't. Another benefit is that the structure of the data being operated upon is expressed explicitly; when we handle a cultivar inside of an Apple inside of a Right, we do so next to an explicit representation of this structure. This is more concise and expressive than describing this structure with a series of accesses and conditionals. ADTs may be composed to an arbitrary depth, and as the nesting increases, so does the benefit of this style of pattern matching over a series of sequential conditionals. Finally I like that this solution gives us an obvious way to bind variables, by putting them inside of the matched constructors like in most functional languages. This reduces accessor noise on the right hand side of the matches.

Example of matching ADT-like data

My proposal is to allow solving the previous problem like this: (the mechanism by which this is done is explained in the "Concrete Proposal" section at the end.)

class Apple {
  static [Symbol.matches](apple) {
    return apple instanceof Apple ? [apple.cultivar] : null;
  }
  constructor(cultivar) {
    this.cultivar = cultivar;
  }
};

class Orange {
  [Symbol.matches](orange) {
    return orange instanceof Orange ? [orange.cultivar] : null;
  }
  constructor(cultivar) {
    this.cultivar = cultivar;
  }
};

class TooSmall {
  static [Symbol.matches](tooSmall) {
    return tooSmall instanceof TooSmall ? [tooSmall.weight] : null;
  }
  constructor(weight) {
    this.weight = weight;
  }
}

class Blemished {
  static [Symbol.matches](blemished) {
    return blemished instanceof Blemished ? true : null;
  }
}

class Left {
  static [Symbol.matches](left) {
    return left instanceof Left ? [left.value] : null;
  }
  constructor(value) {
    this.value = value;
  }
}

class Right {
  static [Symbol.matches](right) {
    return right instanceof Right ? [right.value] : null;
  }
  constructor(value) {
    this.value = value;
  }
}

function describeFruit(e) { // e has type Either<Flaw, Fruit>
  match (e) {
    Left(TooSmall(@weight))  : `Could not retrieve fruit; it only weighed ${weight} ounces`,
    Left(Blemish)            : `Could not retrieve fruit; it was damaged.`,
    Right(Apple(@cultivar))  : `Retrieved a ${cultivar} apple.`,
    Right(Orange(@cultivar)) : `Retrieved a ${cultivar} orange.`
  }
}

As detailed in my proposal below, the @ sigil inside a pattern match creates a pattern which binds the value being matched to the variable whose name follows the @ within the body of the case corresponding to the pattern.

Here we have the whole set of benefits of ADT-style pattern matching, with objects, with minimal overhead. The overhead could be reduced farther if the behavior of pattern matching was defined as performing an instanceof check automatically when the pattern used is a class; this would allow Symbol.matches on classes to remove their instanceof check and ternary, and just return an array. In the case of the Blemish class we can return true rather than an array, as in the current pattern matching proposal, because Blemish does not contain any contents which could be matched on recursively. As in the current proposal, a default implementation of [Symbol.matches] could be provided for classes, which return true if an insteanceof check against the class succeeds and false/null otherwise. This would allow the Blemished class to be defined simply as class Blemished {}.

Examples of matching non-ADT-like data

The examples here are ripped from the F# documentation on active patterns and modified to be written in JavaScript. In the current proposal we can write matchers for arbitrary data. For example, we could write an Even class and an Odd class whose matcher functions expect a number; Even returns true if the number is even, Odd returns true if the number is odd. We can thus match numbers like this:

match(i) {
  Even : `${i} is even`,
  Odd  : `${i} is odd`
}

This is basically syntax sugar for defining even and odd functions and using a ternary. We could do a step better by defining two different extractors for a Color type. Given a color type whose underlying representation is in RGB, we can allow this type to be matched in alternate representations:

class Color {
  constructor(r, g, b) {
    this.r = r;
    this.g = g;
    this.b = b;
  }
}

const RGB = {
  [Symbol.match]: color => color instanceof Color ? [color.r, color.g, color.b] : undefined
};

const HSV = {
  [Symbol.match]: color =>
    color instanceof Color ?
      [getHue(color), getSaturation(color), getValue(color)] :
      undefined
};

Now we can write a pattern match which interacts with and binds variables in multiple representations:

// assume isOver80 is a matcher returning true for numbers over 80
match(color) {
  RGB(255, @g, @b) : // handle a very red color.
  HSV(@h, @s: isOver80, @v) : // handle very saturated colors.
  else: // handle other colors
}

In all branches, the original color object is available due to standard lexical scoping. In the RGB branch, g and b are available. g and b are the second and third results returned by the RGB extractor, and are numbers representing the green and blue components of the Color, respectively. In the HSV branch, the h, s, and v variables are available. This demonstrates how extractor-based pattern matching surpasses ADT-based pattern matching; we have the lightweight, composable syntax of traditional pattern matching, but now the structures that we match can vary independently of the concrete structure of the matched data. In OO fashion, we can maintain encapsulation of the Color type, and provide callers with a set of extractors which each provide a different interface to Color.

To return to the RegExp example from the "extractors" section, this is what it would look like to match a regex:

match(str) {
  /([0-9]+) (foo.*)/(@numbers, @foo) : // numbers and foo are bound here
  /([A-Z]+) (bar.*)/(@letters, @bar) : // letters and bar are bound here
  else: // handle match failure
}

As well as binding patterns like @numbers and @foo, literal or reference patterns could be passed to the regexp pattern, causing that match to fail if the parsed value was not what was expected. This gives pattern matching against regexes the same power and composability that we have for pattern matching against arrays or objects, without any special-case logic for regexes in the pattern matching implementation.

This kind of expressivity may be possible to simulate to some degree in the current proposal using combinators which produce matchers from other matchers, but I believe this would be cumbersome, and also that this would be difficult to combine with destructuring/variable binding.

Concrete Proposal

I propose adding four additional cases to MatchExpressionPattern, one of which allows match patterns to be recursive:

MatchExpressionPatterns :
  <nothing> | MatchExpressionPattern "," MatchExpressionPatterns

MatchExpressionPattern :
  ObjectMatchPattern
  ArrayMatchPattern
  IdentifierMatchPattern
  LiteralMatchPattern
  ExtractorMatchPattern
  ValueBindingPattern
  ValueBindingModifier
  IgnorePattern
  `else`

ExtractorMatchPattern : <name of class or object currently in scope, which has a Symbol.match field> ( MatchExpressionPatterns )

ValueBindingPattern : <any valid variable name, beginning with @, or some other sigil>

ValueBindingModifier : <any valid variable name, beginning with @, or some other sigil> ":" MatchExpressionPattern

IgnorePattern: "_"

The MatchExpressionPatterns inside a ExtractorMatchPattern would be parsed as an array of MatchExpressionPatterns. When ExtractorMatchPattern is encountered, the object being matched is passed to the extractor. If the extractor returns undefined, the match fails. If the extractor returns true, then the Symbol.matches function is using basic matching behavior. In this case, the match succeeds as long as no patterns are nested inside this ExtractorMatchPattern. If there are nested patterns and true is returned, then the match fails, because the nested patterns cannot be satisfied. If the extractor returns an array, then the returned array of objects is matched up with the array of patterns contained within the current ExtractorMatchPattern. If the array of returned objects is shorter than the array of patterns, the match fails. Each pattern is then matched to its corresponding value from the extractor array, and matching proceeds recursively. If every pattern matches, the match succeeds, but if any pattern does not match, the match fails.

The ValueBindingPattern always succeeds in a match. When it is matched, it has the side effect of collecting the value it was matched against, which will then be brought into scope within the pattern's corresponding AssignmentExpression.

The ValueBindingModifier contains another MatchExpressionPattern, and succeeds in a match if the contained pattern succeeds. If the contained pattern succeeds, this also has the same side effect as the ValueBindingPattern, making the value which is currently being matched available to the AssignmentExpression under the name provided in the ValueBindingModifier.

The IgnorePattern always succeeds in a match. It does nothing.

Composition with existing features in this proposal

Any kind of pattern could be nested within an extractor pattern, so extractor patterns would compose with object, array, identifier and literal patterns. Symbol.matches could still return a boolean in cases where recursive nesting is not needed, so no functionality of the existing proposal would be lost.

rattrayalex commented 6 years ago

This is very interesting. If it is feasible, it seems powerful and not unintuitive. I haven't read all the way through, but I'm curious; would this be possible/conformant to your proposal?

const Some = {
  [Symbol.match](x) { return x == null ? null : [x] }
}
const None = {
  [Symbol.match](x) { return x == null ? [] : null }
}
const doubledOrNothing = match (x) {
  Some(@y): y * 2,
  None: 0
}

The parens look like a function call, which might be a bit confusing (and obviously, forever outlaws function calls within match expressions, which may or may not be for the best). I wonder if angle braces could work better:

match (x) {
  Left<TooSmall<@weight>>: `too small: ${weight}`,
  Right<Apple<@cultivar>>: `delicious! ${cultivar}`,
}

This looks like polymorphic types, but in a context where they clearly wouldn't make sense, so I don't think it'd present much of a problem. Just an idea.

Nice writeup!

rattrayalex commented 6 years ago

(The general idea of composability within match expressions is one I hope the team working on this language feature takes very much to heart).

josephjunker commented 6 years ago

Yes, that code is exactly the kind of thing I'm imagining.

I see your point about the parenthesis looking like a function call. My thinking was that this would be differentiated from function calls based on the existence of a Symbol.match function, but I can see how this would be hairy, and I wouldn't be opposed to the use of something other than parenthesis.

rattrayalex commented 6 years ago

Cool! I haven't been following this repo super-closely in the past few months, but I see three new ideas here that I think are worth serious consideration:

  1. Symbol.match functions returning either null or an Array of elements (thus supporting regexp matches and arbitrary others).
  2. Nesting/composing MatchExpressions with some syntax (eg; Left<TooSmall<@weight>> or Left(TooSmall(@weight)), possibly something else could work too).
  3. Differentiating "captured" identifiers from "assertive" identifiers with the use of an @ prefix (eg; [a, @b, c]: b to allow asserting that the first element is a and the third c while extracting the second element b).

The first idea seems like an elegant solution to an existing problem that would add a great deal of power to the feature.

The second seems higher-risk, higher-complexity, but could ultimately bring the match feature up to par with other languages and allow for some really elegant code.

The third idea, if not new to this repo, seems the most obviously worth adopting to me. It resolves a series of issues in a fairly intuitive way without clashing with existing syntax, afaict (I don't think decorators aren't legal here anyway).

Does that seem like a good summary?

What do you think @bterlson ?

josephjunker commented 6 years ago

Yes, I think you nailed it exactly. One point I'd like to double down on, though, is that I think item 2 is really necessary for a feature to be accurately called "pattern matching"-- the core notion of a pattern in most languages is that it's a declarative representation of the data structure being worked with, including any relevant layers of nesting in that structure. Item 1 is really just a way to enable items 2 and 3 in a hybrid language.

(Edit for clarity: When I say I think item 2 is necessary, I mean item 2 or some substitute, which allows for recursively nestable patterns on the left side of a match line without needing nested match statements on the right side)

josephjunker commented 6 years ago

I also really like the terms "captured" and "assertive" identifiers, I was struggling to find a good name to describe those.

camminati commented 6 years ago

This is very interesting and very good explained. I think/hope this should/will be in the core of pattern matching for JS right up.

A quick question, as this is some how related to the extractors, although not exactly the topic here: could you write then a matcher for a recursive call like:

def listToString(list: List[String]): String = list match {
    case s :: rest => s + " " + listToString(rest)
    case Nil => ""
}

scala example

like this:

const listToString = match(list) {
    [@s ...rest] : `t${s} ${listToString(rest)}`;
    [] : ´´;
}

would like it to be JS

And by the way, I don't see the problem with forever banning function calls on the left hand side of the case clause, that way you keep side effects out of the way.

josephjunker commented 6 years ago

This would not quite work with how I've defined my proposal above, but I think the proposal could easily be amended to accommodate this. The way I would probably do it is by making the ArrayMatchPattern be sugar syntax for an extractor on arrays, where this extractor just returns the array itself. The definitions would also need to be tweaked to allow for a new ArrayRestPattern as an option inside of a MatchExpressionPattern. I imagine that this would look something like this:

const listToString = match(list) {
    [@s, ...@rest] : `t${s} ${listToString(rest)}`;
    [] : ´´;
}

I think this would be a really good addition, because there's plenty of other extractor cases where a rest collection would be useful, such as inside a RegExp extractor.

This said, in this case I think there's a better way to solve this problem, which shows the power of extractors. (I'm not trying to criticize your code here; I know this is a quick contrived example. But your code demonstrates something that is a pain point for me in JS, and I think it can make a good example.)

How listToString is written now, it's a polynomial operation. If the array is 5 items long, in the first recursive step this function will pull one item off the array, and then make a copy of the last 4 items of the array, then in the next step it will make a copy of the last 3 items, etc. The result is an n step algorithm, which performs an average of n/2 copies per step, making it O(n^2) in all. When operating on a list, performing one simple operation per item, we'd really like our algorithm to run in linear time, like an imperative solution would. The problem here is that recursive list processing as it is commonly done in functional languages is naturally and efficiently defined on linked lists, but not on arrays. This leads to an impedance mismatch in JavaScript, due to arrays having convenient syntax sugar but the wrong structure for these kinds of algorithms (a regular source of annoyance for me.)

The way I would write this function using extractors would be something like this (leaving tail-call related optimizations out of this example):

class Cons {
  [Symbol.match](x) {
    return Array.isArray(arr) && arr.length > 0 ? [arr[0], new Cons(arr, 1)]
    :      x instanceof Cons && !x.isDone()     ? x.next()
    :      /* type mismatch                    */ null;
  }

  constructor(arr, index) {
    this.underlyingArr = arr;
    this.index = index;
  }

  next() {
    return [
      this.underlyingArr[this.index],
      new Cons(this.underlyingArr, this.index + 1)
    ];
  }

  isDone() {
    return this.underlyingArr.length <= this.index;
  }
};

const Nil {
  [Symbol.match]: x =>
    (Array.isArray(x) && x.length === 0) ||
    (x instanceof Cons && x.isDone())
}

const listToString = match(list) {
    Cons(@s, @rest) : `${s} ${listToString(rest)}`;
    Nil: ``;
}

There's a lot here, because I've defined two reusable extractors, Cons and Nil. These can be reused for any algorithm on arrays which we want to express in terms of recursive operations on a linked list, while maintaining the same asymptotic time and space complexity. They're more complicated than the extractors in my first post, because they both match multiple kinds of inputs: Array and Cons. The idea is, we can use these in a match expression on an Array, and if we end up matching a Cons cell, for the remainder of the recursive algorithm we'll be dealing with Cons cells. These aren't actual Cons cells as implemented in a traditional linked list, though. Rather, they're "views" into an array. The extraction function creates these new Cons views on-demand as the algorithm continues recursively, and Cons is implemented such that all of the cells point to the same underlying array instance. As a result, there's no array copying and the additional memory overhead is constant, and relatively similar to what we'd pay if we used an iterator in a for loop.

Due to the nature of patterns and extractors, we can use Cons and Nil recursively. Here's an example of a recursive function that breaks an array into tuples of length 2, leaving off any odd remainder at the end. (I wrote this one in tail-recursive style, for contrast, and because this gives a realistic idea of what this style would look like in production use)

function tupleify(arr) {
  const recursive = (arr, results) => match(arr) {
    Cons(@x, Cons(@y, @rest)): tupleify(rest, results.concat([x, y]));
    Cons(_, Nil): results;
    Nil: results;
  }

  return recursive(arr);
}

The list/array impedance mismatch is a case where the power of extractors really shines. We have functions which can be elegantly implemented on one data structure (a recursive cons list) but need to run efficiently on a completely different data structure (an array). Extractors allow for a decoupling between the concrete implementation of data and the interface through which we work with that data. In cases where a conversion between the actual and desired types can be efficiently accomplished in a piecewise fashion, (like with arrays -> lists) extractors let us perform this conversion implicitly and lazily.

Footnote: There's more that could be done here. For a full implementation, I'd also want to provide a mechanism for constructing Cons cells in the traditional FP way, by storing whether they're a "real" cons cell or a view into an array internally. But this is excessive for an example in this case.

josephjunker commented 6 years ago

As far as prohibiting function calls on the left side of a pattern matching function goes... I can't think of any case where I've seen a function call made on the left-hand side of a pattern outside of a guard clause. (but since I mostly write JavaScript, it's very possible that someone who spends more time in Haskell/Scala/OCaml might have experience which says otherwise.) It definitely sounds like it would be a niche situation in which an assertion identifier would need to have its value dynamically created within a pattern, and in which creating this in scope before the pattern would not be an acceptable replacement. (I can imagine this happening when there's a large number of patterns, each of which needs a different dynamic call, or when computing this value is expensive and something that we only want to do if previous matches failed. Both sound like rare cases.)

That said, this seems like a pretty big door to close just for the reason of syntax conflict over parenthesis. I'm not worried about the "what if someone adds a side effect" possibility, because it's still possible to add side effects to a match function, or side-effecting getters to the data being matched on. In a language as dynamic as JS, I think trusting devs to not go entirely off the rails is necessary.

While I've never seen it done in other languages (maybe because so few of them use extractors) I think there could be exciting potential for the use of combinators in pattern matches. In other words, patterns could be created by passing arguments to functions which dynamically create objects with a Symbol.match function, or modify (via wrapping and delegation) the matching function of another class/object. This may be unnecessary or more dynamic than would be manageable in practice, but it could also be really cool 😁

Another possibility for syntax other than parens or angle brackets might be curly brackets. I can't think of any syntax which reasonably belongs in a pattern match that they might conflict with, and node already uses them to pretty-print objects that were made from classes. For instance, a class Foo with the fields x and y will have an instance printed like this: Foo { x: 1, y: 'bar' }. I find the parallel between that and the match pattern Foo{1, 'bar'} aesthetically pleasing.

camminati commented 6 years ago

@JosephJNK ok I see it. I get that the listToString implementation is not the best example, because Array is not a real List. It behaves exactly as expected though, where rest is a copy of the Array - s, but it pays a huge fee on performance.

I like your solution with extractors very much. This would allow the use of the likes as immutable collections and leave the implementation of my extractor class as I see fit, or even better, they could provide one.

I'd say extractors FTW!

As for the pattern-syntax, there are so many examples out there. I tend to think, the simple things should look like what you would write in an if-statement. I'm very fond of the scala syntax, specially in the case of matching collections. I am not hard sold on any of these, specially the use of :instead of =>, but that train already left. Curly brackets certainly look better than angle brackets though.

masaeedu commented 6 years ago

I agree that there's an underserved need, but I don't like the idea of forcing the extractor onto the data itself. The purpose of pattern matching is to be able to work systematically with unstructured, heterogeneous data, without first being forced to homogenize it. If I get an array of values from somewhere, I don't want to have to go through and decorate all the objects with symbols before being able to map over it with a match.

The way I've been dealing with this stuff in the absence of real pattern matching is using the following "poor-man's match" function:

const match = (p, ...c) => x => new Map(c).get(p(x))(x)

Let's look at how this works with your use case. Original approach:

function describeFruit(e) { // e has type Either<Flaw, Fruit>
  match (e) {
    Left if e.value instanceof TooSmall : `Could not retrieve fruit; it only weighed ${e.value.weight} ounces`,
    Left if e.value instanceof Blemish  : `Could not retrieve fruit; it was damaged.`,
    Right if e.value instanceof Apple   : `Retrieved a ${e.value.cultivar} apple.`,
    Right if e.value instanceof Orange  : `Retrieved a ${e.value.cultivar} orange.`
  }
}

Alternative:

const describeFruit = match(
  x => x.value.constructor,
  [TooSmall, e => `Could not retrieve fruit; it only weighed ${e.value.weight} ounces`],
  [Blemish, () => `Could not retrieve fruit; it was damaged.`],
  [Apple, e => `Retrieved a ${e.value.cultivar} apple.`],
  [Orange, e => `Retrieved a ${e.value.cultivar} orange.`])

If this looks a little LISP-y, that's no accident; this is an exact ripoff of Clojure's multimethods, sans macro syntax sugar, and has been used to great effect in that language.

The benefit of this approach is that there's no "magic" associated with the data. The data you're matching over still consists of dumb, simple values, and these values can come from anywhere with no preparation required. The smarts are in matching site, which can use an arbitrary projection to slice and dice the data in a way that homogenizes it. I think an extension to the pattern matching feature for JS that takes its inspiration from Clojure would be recognizing a closer kinship than one which tries to imitate statically typed languages by investing data with smarts about itself.


Here's a full code snippet you can play around with:

const { match } = require("natch")

class Apple {
  constructor(cultivar) {
    this.cultivar = cultivar; // cultivar is a string
  }
};

class Orange {
  constructor(cultivar) {
    this.cultivar = cultivar;
  }
};

class TooSmall {
  constructor(weight) {
    this.weight = weight; // weight is a number
  }
}

class Blemished {}

class Left {
  constructor(value) {
    this.value = value;
  }
}

class Right {
  constructor(value) {
    this.value = value;
  }
}

const describeFruit = match(
  x => x.value.constructor,
  [TooSmall, e => `Could not retrieve fruit; it only weighed ${e.value.weight} ounces`],
  [Blemished, () => `Could not retrieve fruit; it was damaged.`],
  [Apple, e => `Retrieved a ${e.value.cultivar} apple.`],
  [Orange, e => `Retrieved a ${e.value.cultivar} orange.`])

console.log(describeFruit(new Left(new Blemished()))) // Could not retrieve fruit; it was damaged.
console.log(describeFruit(new Right(new Apple('Granny Smith')))) // Retrieved a Granny Smith apple.
masaeedu commented 6 years ago

I've read the post a few more times, and I think I might have been misunderstanding what the proposal is. Can "extractors" be attached to some arbitrary object, as opposed to the value you're matching? E.g.

match({ x: 1 })
{
  { [Symbol.match]: ({ x }) => !!x ? [x] : null }(@v): v * 2
}

// => 2

If so, please ignore all of the above; I'd be really excited to see an extension like this to the pattern matching feature.

josephjunker commented 6 years ago

Yes, extractors can be attached to any object. The matched object is not checked for extractors. The purpose of attaching extractors to the classes Left, Right, Apple, Orange, TooSmall and Blemish in my example is that this allows the classes to be used as extractors for the data they construct. When programming with algebraic data types in JS, attaching extractors to the classes which are used to construct this data allows for a direct encoding of pattern matching on algebraic data types; it's a means of demonstrating that classes + extractors have at least as much power as patterns in ML-family languages (which I would argue are the most "canonical" form of the feature.)

As a side note, multimethods are powerful, but are really a different feature than pattern matching-- core.match in Clojure is a better point of comparison to patterns in JavaScript, and the "object patterns" that are described in the main body of the original proposal here are a relatively faithful port of Clojure's pattern matching semantics to JavaScript. Notably, when using core.match, the patterns usable in a match are still tightly coupled to the structure of the data being matched, as in ML-family languages and in the original proposal in this repo. I'm happy to talk more about the distinctions here, but I think they may be slightly afield of the topic at hand.

masaeedu commented 6 years ago

@JosephJNK I see, thanks. I was indeed misunderstanding the proposal originally.

Regarding core.match: Clojure multimethods are the core technology underlying core.match. It isn't appropriate to compare core.match to the feature being proposed here, because core.match is a library, whereas we are discussing the merits of a language feature (for which the analogous Clojure language feature is multimethods).

By providing the ability to write multimethods in Clojure, the language set itself up for the ability to have first class pattern matching in userland: core.match simply provides a way to build efficient case selection projections for a Clojure multimethod. Since they're building a projection function ahead of time, rather than simply trying each case one by one and dispatching on the arbitrary result, they can analyze the entire case table and actually build an optimal decision tree for matching a provided value with the appropriate case. See https://github.com/clojure/core.match/wiki/Understanding-the-algorithm

zkat commented 6 years ago

Hey y'all! #65 has gotten merged, and a lot of issues have become irrelevant or significantly changed in context. Because of the magnitude of changes and subtle differences in things that seem similar, we've decided to just nuke all existing issues so we can start fresh. Thank you so much for the contributions and discussions and feel free to create new issues if something seems to still be relevant, and link to the original, related issue so we can have a paper trail (but have the benefit of that clean slate anyway).

As far as this issue in particular goes: the new proposal does, indeed, have an extractor feature based on Scala's extractors with a new binding format. I think that fulfills the thing this issue was asking for!