Proper sum types and structural pattern matching

omentic commented 1 year ago

Abstract

Proper sum types and structural pattern matching for Nim by the introduction of union and extension of case with structural matching and and where clauses. Heavily inspired by Rust, Swift, Racket, Python, and others, and adapted to Nim's particular type system and syntax.

Motivation

I write a lot of Nim code, and I write a lot of Rust code. I find both to be excellent languages with their own unique tradeoffs and specialties. But there is one thing in particular I greatly miss when going from Rust to Nim: and that is algebraic data types and pattern matching.

Algebraic data types (ADTs), for those unfamiliar, are an extremely powerful data structure that are useful for modelling a wide variety of types. They allow, by the composition of product types and sum types (as well as ordinary types), the modelling a wide variety of different data structures.

Nim does have algebraic data types - objects may be composed arbitrarily. Their application to product types is straightforward: product types are directly analogous to structs (or in Nim, standard objects). Sum types complement product types and can be thought of as their dual: they act as a wrapper around different types, letting you interact with their inner structure only by safely pattern matching upon the container. They are best represented by tagged unions, however, which Nim lacks a direct implementation of.

Variant objects are capable of modelling tagged unions and thus sum types. These are useful as, well, variants of objects, but have a number of serious drawbacks when used as sum types that the first portion of this RFC aims to address.

The names of fields in variant objects must be unique - they cannot be repeated, even across different variants.
A separate enum must function as a "kind" tag and be discriminated on by case within the object. Having a separate single-use enum is undesirable, and having to discriminate against it is awkward.
Being forced into an object structure is awkward: the "type wrapper" approach of tagged unions is more ergonomic, especially when there are no shared fields, as is often the case.

As for pattern matching: Nim has the case statement, which alone is sufficient to make use of the inner contents of an sum type (or currently, switch on a kind label). But structural pattern matching is the bread and butter of ADTs: aside from simply being cleaner than nested if statements, it provides guarantees of exhaustiveness not otherwise given by the alternative. It is perhaps best exemplified by the ML family of languages (including Rust): but other languages have picked it up offering their own takes on it, notably Racket, Swift, and Haxe. This RFC pulls from a number of different designs to provide a powerful and Nim-esque model.

Description

This RFC is two-fold. It introduces a new fundamental type, union, with similar semantics to Rust's enum. It also extends Nim's existing case statement with support for structural pattern matching and particularly that of unions/objects/tuples (although it supports lists as well), and introduces a new supplemental where clause in order to cleanly and powerfully do so.

union is a type that functions similarly to enum at a surface level. It takes a list of identifiers as fields/variants/values, and an instance of a union may only ever be one of the fields. However, each field of the union may additionally be of its own entirely separate type (or none). This is best exemplified with the classic "shape" example:

type Shape = union
  Point
  Line: int
  Square: tuple[side: int]
  Rectangle: tuple[x, y: int]
  Triangle: TriangleKind
  # type TriangleKind = enum Scalene, Isosceles, Equilateral

These fields of the unions are unordered, unqualified (via the same mechanism as overloadable enums), and unique. The inner fields are each of distinct types: entries used solely for their label (ex. Point) are implicitly of void type. Instantiating a union, then, is done by picking a particular variant, and providing data for the construction of its associated type (or none) as needed. The syntax of this is again best shown with an example. As a special syntax rule, tuples and objects may drop their extraneous parentheses on construction.

var shape: Shape
shape = Point()
shape = Line(5)
shape = Square(side: 5)
shape = Rectangle(x: 5, y: 6)
shape = Triangle(Scalene)

To support proper and powerful pattern matching, we introduce three new extensions to the language: an extension for existing case/of statements to support (nested) structured types (array, seq, tuple, object, union), an extension for of expressions in regular conditionals to support the same, and a new where expression that functions as a reusable "guard". Just like union, these language features combine to provide power with minimal new syntax. Following existing case semantics, this pattern matching may be exhaustive (to be described later), or non-exhaustive and followed by elif and else statements.

Elements of a tuple may be matched in two ways: by providing a value, which will match all tuples with that value at that location, or by binding them to a new identifier, which will match all tuples with any value at that location, and will then be made available as a variable within the branch. New variables introduced by pattern matching in this fashion follow the mutability of the matched-upon statement they are a part of. As is the case elsewhere, binding an element to the _ identifier discards it.

where guards allow for restricting the matching of a case based on an arbitrary conditional expression. They have access to any bound fields: and so can be used to restrict variables to a value while still allowing access to that value among other, arbitrary conditions. where clauses follow the main match section of an of line and are considered to be attached to the main match rather than the of itself to allow for the reuse of labels to follow existing case semantics: see the example below. Of important note is the intersection between variable bindings and repeated cases: in order for a variable to be accessible within the body of the case, it must be bound (in a consistent location) in every label of that statement. We do not disallow this at a binding level because the bindings are still helpful for where clauses.

The exhaustiveness of a pattern match is determined on a best-effort attempt. Fields of booleans, enums, and unions consist of few possibilities and may be exhaustively matched upon. Guards do not contribute to the exhaustivity of a search as they may contain arbitrary conditional expressions: and so must typically be followed by a case for the same match without the guard, or else.

Named tuples and objects are matched identically to tuples, with one exception: the identifier name when binding a field must be the name of the original field. If this is undesired, the syntax oldident: _ as newident is introduced to allow for explicit rebinding. Unions are matched by specifying their label, and then their inner type in parenthesis. Tuples and objects within a union may elide their parenthesis.

func match(shape: Shape) =
  case shape
  of Point():
    ...
  of Line(_ as x) where x in 0..4:
    ...
  of Line(_ as x):
    ...
  # matching an existing variable
  of existingShape:
    ...
  # matching an existing variable, convolutedly
  of _ as x where x == existingShape:
    ...
  # tuples within a union elide their external parenthesis.
  of Square(side: _ as x):
    assert declared(x)
    assert not declared(side)
  # Rectangle(x, y) is syntax sugar for Rectangle(x: _ as x, y: _ as y)
  # matching multiple expressions follows case semantics
  of Rectangle(x, y) where x == y, Rectangle(0, _), Rectangle(_, 0):
    # x is not bound in every case! so it is not accessible
    ...
  else:
    ...

Previous thoughts on matching sequences, arrays, and sets are collapsed in the below block. I no longer think they're terribly important. They are also not updated for the new backwards-compatible _ as x syntax.

Matching sequences, arrays, sets

10 and x[10] == true: ... # lists may be arbitrarily nested, just like every structured type of [[1, 2, 3], [4, 5, 6], [7, 8, 9]]: ... # match an arbitrary list of integers of x where typeof(x) is openarray[int]: ... # match a list of entirely 1s: of x where x.all(y => (y == 1)): ... # the following three statements match identically: of [_, _, _]: ... of x where x.len == 3: ... elif list.len == 3: ... else: ... ``` Matching sets and tables mostly follows list semantics. Both are wrapped in braces instead of brackets or parentheses. As the elements of sets and tables are never named: the `:` does not function as it does in tuples and objects, instead, it takes its regular semantics within tables to denote a key-value pair. Of additional note is the `..` operator: as sets and tables can be of unknown length it is helpful: however, as they are unordered, the `..` operator *must* come last. ```nim func sets(bitset: set[int8]) = case bitset # match the set containing exclusively 1 of {1}: ... # match a set containing at least 1 of {1, ..}: ... # match a set containing only two distinct elements of {x, y}: ... # match a set containing any two distinct elements fulfilling a condition of {x, y, ..} where x + y == 10: ... else: ... func tables(table: Table[int, string]) = case table # match a table containing only these key-value pairs of {5: "foo", 6: "bar"}: ... # match a table containing at least these key-value pairs of {5: "foo", 6: "bar", ..}: ... # match a table containing any key matching a condition of {x: y, ..} where x == y.len: ... ```

While the exhaustivity of pattern matching is important: frequently you want to disregard it, and only worry about a single case. Rust provides the if let construct for this use case, which I find very useful but pretty gross syntactically. Instead, this RFC uses the syntax if shape of Square(side: binding). The bindings become immediately available for use within the body of the if statement and within any parts of the if statement following that clause, as shown by the following examples:

shape = Rectangle(x: 5, y: 6)

if shape of Rectangle(x, y) and x == 5:
  echo "y is " & $y
else:
  echo "failure."

Following the semantics above, these extended case and if ident of match statements should also allow unwrapping Option[T] types. What this looks like is not yet specified.

Code Examples

Algebraic Data Types

An example that perhaps more showcases the utility of ADTs is that of an abstract syntax tree for the simple lambda calculus. ```nim type Ident = string type Expr = ref union Constant: Term Variable: tuple[id: Ident] # Annotation: tuple[expr: Expr, kind: Type] Abstraction: tuple[param: Ident, fn: Expr] Application: tuple[fn, arg: Expr] Conditional: tuple[cond, `if`, `else`: Expr] ``` Note that other structures needed for an implementation of the simple lambda calculus (including a type system) are also efficiently and aesthetically modelled with ADTs. ```nim type Type = union Empty, Error, Unit, Boolean, Natural, Integer, Float, String Enum: seq[Type] Struct: Table[Ident, Type] Function: tuple[from, to: ref Type] type Term = union Unit Boolean: bool Natural: uint Integer: int Float: float String: tuple[len: uint, cap: uint, data: seq[uint]] Enum: tuple[val: uint, data: seq[Type]] Struct: Table[Ident, Term] Function: Expr ```

Pattern Matching

```nim var globalSymbolTable = initTable[Ident, Term]() func execute(ast: Expr): Term = case ast of Constant(_ as term): return term of Variable(id): return globalSymbolTable[id] # other structures of applications + abstractions are invalid of Application(Abstraction(param, fn), arg): globalSymbolTable[param] = execute(arg) return execute(fn) of Conditional(cond, `if`: _ as ifClause, `else`: _ as elseClause): if execute(cond) == Boolean(true): return execute(ifClause) else: return execute(elseClause) else: echo "Failed to execute AST! Improper structure" quit() ```

Backwards Compatibility

This RFC is almost fully backwards compatible with Nim. It will cause issues with anything using the new union or where keywords, and with any templates or macros directly inspecting case or of.

A previous version of this RFC had an issue around pattern matching: the previous syntax was of x, where x is a newly bound identifier. This conflicted with the existing semantics to resolve constants in those expressions. This was fixed by changing pattern matching to require explicitly rebinding the catchall character _, i.e. _ as name, with syntax sugar in selected cases (named tuples & structs).

Notes

To end with some notes: the general design of where guards is to provide all of the parts of pattern matching that cannot be exhaustive, in one general construct. Hence the use of where for things regarding ranges, lengths, existing variables, generic types (although: type constraints can be exhaustive. this is specific enough that i thought it not worth special syntax). where was also chosen over reusing if because I wanted to emphasize that there is not a corresponding elif/else as it is, already, what amounts to an elif clause within the broader case statement.

I think I like oldident: newident for field renaming. It matches object constructors. oldident as newident is an alternative, however.

I only mentioned it briefly, but because I've seen it discussed in other implementations of pattern matching in Nim: the mutability of the introduced bindings while pattern matching follows from the mutability of the matched-upon object, that is, you're directly mutating the matched-upon object. Other languages follow this pattern and I see no reason why this + variable shadowing if you really want to make it mutable are good enough.

I'm considering allowing guards to be reusable on for blocks a la Swift: with for binding in iterator where condition acting as syntax sugar to the common for binding in iterator: if condition. Is this desired? Is this good or bad for language consistency? Are there other places where clauses could be used?

The else statement and case _ are functionally the same. I believe this is currently the case.

References and Acknowledgements

https://github.com/nim-lang/RFCs/issues/245 and https://github.com/nim-lang/RFCs/issues/368
Nim: patty, gara, fusion/matching, fungus
Python: PEP 635 and PEP 636
Racket: the match statement
Swift: the where statement
Grain: nested pattern matching
Rust: the match statement

Many thanks to fungus: which implements a near-identical union and quite similar match already as macros. Many thanks to patty: particularly for their "does not yet work" section, ha. Many thanks to Beef, for having good taste and calling my stupid ideas stupid.

levovix0 commented 1 year ago

Is it really necessary to change the language syntax for this? Maybe it would be better to standardize macros that implement the same thing on the existing syntax?

The current variant objects are really quite unconvinient, and it would be nice to have syntactic sugar over them to create algebraic types.

juancarlospaco commented 1 year ago

Python: Success, used by people, part of StdLib (very recent on very old lang).
Racket: Success, used by people, part of StdLib.
Swift: Success, used by people, part of StdLib.
Grain: Success, used by people, part of StdLib.
Rust: Success, used by people, part of StdLib.
Nim: Fail, not widely used by people, not part of StdLib.

🤔

Araq commented 1 year ago

That doesn't mean much beyond the fact that people avoid dependencies for various reasons.

Araq commented 1 year ago

So what does this RFC accomplish that e.g. fusion/matching does not?

You don't need to teach me about ML and Rust and Racket and Foozbaz, I know. You need to tell me why you found the existing solutions based on macros lacking.

juancarlospaco commented 1 year ago

fact that people avoid dependencies

We agree.

omentic commented 1 year ago

So what does this RFC accomplish that e.g. fusion/matching does not?

The union type is new. Something similar exists as a macro in beef/fungus, but the implementation differs from how it would be implemented in the compiler for performance and semantic reasons.

fusion/matching also provides comprehensive structural pattern matching, there is no feature discrepancy between the two. This RFC is an attempt to integrate it into the language more: by means of reusing existing case syntax and pulling things that would require specific new syntax and not help with exhaustiveness into where clauses (this is described in more detail in the first paragraph of the notes section above).

After work, I'll write up a detailed comparison between the two because I think it reveals some interesting design decisions. But a place to look for now, if interested, is the difference between this RFC's use of where guards vs. fusion/matching's use of until, all, some, @. Revisiting fusion also reminds me I forgot to describe how tables are matched. It's exactly as you would expect and consistent with tuples etc, I'll add this later too.

omentic commented 1 year ago

Also, a little note on other languages: despite Racket's match statement being the most powerful of the bunch and Python being the closest language to Nim syntactically, I took the least inspiration from them. I find Python's match to be grotesquely complex for relatively little benefit, and think where clauses a la Swift to be a much nicer solution. This proposal is also purposely less powerful than Racket: Racket's ability to match on repetitions of arbitrary nested list expressions requires special syntax and is only particularly helpful to a list-oriented language. Nonetheless, it can be done with where clauses if needed (this is a trend...).

arnetheduck commented 1 year ago

Very nice!

ADT's are to Nim case types what scalars are to vectors - you can express everything an int can express using a seq[int] but doing so is cumbersome in code and misses out on many of the assumptions you can make and therefore on the help that the language and compiler can bring.

It's true that the proposed type sort of is a special case of a case object with a specific form of enum, exactly one discriminator and no other members - but that special case is useful in many constructs which logically express "one and only one of these things" - the rest is followup on top of that concept.

Rather than asking "can macros solve this?", the better question is "can we create better macros with this feature present", and there the answer is undoubtedly yes, because it's such a fundamental construct that appears over and over in many code patterns.

Many things become more simple when you can these assumptions about a type: as mentioned already: resetting or overwriting during serialization, exhaustive case statements, etc.

Here's a simple example which relies on the knowledge that a type takes on exactly one shape: because the type is guaranteed to be exactly one of the enumerated options, we can reason about that option without specifying the full type but still express a meaningful relationship to it. In Result[T, E] this is often useful because you want to be able to express a type that matches any Result[T, X] where X is known but T is not to express the idea that you are working with "the error side of type X of any Result" without knowing the exact type of the result (when returning an error you don't care about the "value" side - this is supplied by the context in which the error is being returned, not the error return itself) - such a type can safely be converted to a "full" Result because it is known per language semantics that Result itself takes one and only one of the given options.

jmgomez commented 1 year ago

I generally like the idea and I was one of the complainers about the PM situation. But as I said in Discord, if this ever gets accepted, who is going to take ownership and implement it. Especially PM? I really hope that the roadmap is unaltered and IC still comes right after 2.0.

So what does this RFC accomplish that e.g. fusion/matching does not?

At the very least, it will be a workaround for this :P https://github.com/nim-lang/Nim/issues/20435

konsumlamm commented 1 year ago

How would the proposed syntax distinguish between a single field that is a tuple and multiple fields? Using tuples for the latter is not nice imo. I'd prefer something closer to the syntax for constructing/matching the variants (which is also how Rust and Swift do it):

type Shape = union
  Point()
  Line(int)
  Square(side: int)
  Rectangle(x, y: int)
  Triangle(TriangleKind)

Araq commented 1 year ago

In my unwritten proposal I used this syntax:


type
  Tree = ref case kind # also possible without the `ref`
    of StringLeaf:
      str: string
    of IntLeaf:
      x: int
    of Parent:
      kids: seq[Tree]

No new keyword required and the syntax is optimized for pasting into a case statement.

omentic commented 1 year ago

@konsumlamm Currently:

type Shape = union
  Rectangle: tuple[x, y: int]
type Shape = union
  Rectangle: tuple[tuple[x, y: int]]

I picked this syntax to be as close to Nim's existing semantics as possible while still providing the full power of ADTs. I like your syntax more aesthetically, but IMO it doesn't fit in with tuple / object types as well. Also note that the inner type of Rectangle in both of our examples is tuple[x, y: int]: even if it's not explicitly specified in yours.

@Araq I assume the unwritten proposal was mostly for syntactic sugar over the existing variant object syntax? A benefit of the union proposal and ADTs as a whole is that they let you abstract over arbitrary types, not just named fields of objects. (eg. you don't really need the labels str, x, kids in your Tree implementation, just the types)

Araq commented 1 year ago

eg. you don't really need the labels str, x, kids in your Tree implementation, just the types

That's a good point but easily fixed:


type
  Tree = ref case # also possible without the `ref`
    of StringLeaf:
      string
    of IntLeaf:
      int
    of Parent:
      seq[Tree]

The advantages remain: Natural Nim syntax, no new keyword, easily turned into a case statement.

omentic commented 1 year ago

I disagree that that's natural Nim syntax, I find it subjectively very ugly, the lack of a field following case inconsistent, and the raw types in branches weird. I don't think being easily turned into a case statement is a particular advantage, and it really doesn't follow type declaration syntax or semantics (I would be quite thrown off learning Nim for the first time).

I'm not particularly attached to the union keyword but I think introducing a new keyword to make it a first-class type on the level of object / tuple / enum would be worth the tradeoff from a language design perspective.

omentic commented 1 year ago

I suppose we could also do this leaving eg. nested unambiguously available for use elsewhere, though I don't like it.

type Tree = ref nested enum # oops need indirection
  StringLeaf: string
  IntLeaf: int
  Parent: seq[Tree]

Araq commented 1 year ago

I care about keeping the of branches, if you want to change case to sumtype or union I don't mind it. Though both sumtype and union are ugly. Maybe enum?


type
  Tree = ref enum # also possible without the `ref`
    of StringLeaf:
      string
    of IntLeaf:
      int
    of Parent:
      seq[Tree]

Araq commented 1 year ago


type Tree = nested enum
  StringLeaf: string
  IntLeaf: int
  Parent: seq[Tree]

is wrong for two reasons: You need the ref indirection in order to nest it. And StringLeaf: string is not a field of type string, it is a branch that supports an unnamed string field.

omentic commented 1 year ago

That kinda goes against the existing semantics of enum being explicitly an enumeration though, and also conflicts a bit semantically (not in syntax) with the existing feature that lets you set labels of an enum to values. My original approach was to overload the enum type but I found it conflicted too much with existing expectations of enum.

Now that I think about it more, I really don't like the idea that you can copy and paste the of branches into a case statement actually: most of the times when I use ADTs in other languages I don't want to handle all the cases, and/or want to handle particular cases differently, via structural pattern matching or guards or what not. The extended examples I picked for this RFC are a bit contrived in that they are for interpreting a language where you have to handle every case without restrictions. I think that'd kind of lead people in the wrong direction wrt. pattern matching.

re: ref, oops. ref indirection is actually supported and necessary in this RFC btw, there's some examples in the collapsed Code Examples block.

omentic commented 1 year ago

And StringLeaf: string is not a field of type string, it is a branch that supports an unnamed string field.

Yeah, even though they're comparable to objects and the syntax looks similar, they're not. This is one of the few parts that differs from existing semantics. I personally think it's fine - concepts do the same thing and introduce different-than-objects but still very consistent syntax within their type declaration.

Araq commented 1 year ago

I think that'd kind of lead people in the wrong direction wrt. pattern matching.

Actually, this is how you ensure correctness, having thought about it, you handle every case explicitly. It's the "oh and in all other cases (else) do ..." that produces bugs. In the Nim compiler and everywhere else.

metagn commented 1 year ago

Maybe case object?

nc-x commented 1 year ago

If a non-macro solution for ADTs/Pattern Matching is finally being considered, then my vote goes to a rust/haxe like syntax only. A lot of the syntaxes suggested above are confusing / error prone / ugly just like the existing object variant syntax (imo).

deech commented 1 year ago

ML style ADTs would be a great idea and I won't quibble about the syntax but I worry that too much conditional freedom on each match arm eg. x where typeof(x) is ... will cripple exhaustiveness checking which is biggest unseen benefit of ADT's. I would be happy with much less pattern matching expressivity in return for the compiler flagging missing patterns. IMO previous efforts like fusion/matching and patty do too much.

Have you looked D's sumtype? It is inspired by ADTs but leans heavily into static introspection and would be a nicer fit for Nim vs. trying to copy Haskell/OCaml.

konsumlamm commented 1 year ago

Have you looked D's sumtype? It is inspired by ADTs but leans heavily into static introspection and would be a nicer fit for Nim vs. trying to copy Haskell/OCaml.

D's sumtype doesn't support nested patterns, which is half the point of pattern matching and would make a lot of things more cumbersome, so I don't think that's an approach we should follow.

deech commented 1 year ago

D's sumtype doesn't support nested patterns, which is half the point of pattern matching and would make a lot of things more cumbersome.l, so I don't think that's an approach we should follow.

It's more cumbersome but when you can only destructure on one thing at a time static analysis is much easier, you can get better IDE support, possibly even exhaustiveness checking, and maybe even case splitting like Idris.

Araq commented 1 year ago

possibly even exhaustiveness checking

Nim's case statement already does exhaustiveness checking and I have no intention of moving to something that lacks it.

deech commented 1 year ago

Nim's case statement exhaustiveness checking is pretty brittle:

type
  AnEnum = enum
    A
    B

let e = A

case e
of A: echo "A"
else:
  case e
  of B: echo "B"

Here I get the error: /tmp/testcase.nim(11, 3) Error: not all cases are covered; missing: {A}

Araq commented 1 year ago

It's not "brittle", it works as intended. I'm also not familiar with any mainstream language that offers exhaustiveness checking over nested case/match statements in the way that you outline.

deech commented 1 year ago

Not mainstream, and not completely generally but with a ton of effort Haskell does. But the point is by making the pattern matching more "cumbersome" but also more analyzable you can get this but without most of the implementation complexity.

omentic commented 1 year ago

@deech Ah, I'll write up a little thing on this with more detail. But basically: I'm very aware of this, and designed it so that the where statement is used for things that cannot be exhaustively checked.

@metagn I don't love case object - I think that gives the impression that it is fundamentally an object, which it isn't. Maybe case enum though? Though I do much prefer a new keyword.

Araq commented 1 year ago

Though I do much prefer a new keyword.

A new keyword would be nice if we could find a good one but so far I don't like any of the proposed new keywords. FWIW Swift reuses "enum" and Swift in general chooses nice names.

haxscramper commented 1 year ago

Skipping most of the syntactic details (union vs case kind vs case enum vs sumtype etc.) as they constitute about 5% of the actual problem in question (after deciding on placement of the library/language addition, specific list of semantics that would be available and so on)

Maybe it would be better to standardize macros that implement the same thing on the existing syntax?

Macros like these have issues along the lines "can't work in generics" https://github.com/nim-lang/Nim/issues/20435 -- that is something that has been an issue since day 1 and small number of complaints about this is something that I can only attribute to the low use of the fusion/matching
At this point, it does not seem that you can generate a mutually recursive object using macros -- can generate a single type section, but not define things that cross-reference each other. Might've changed since last time I've checked, though.

Is it really necessary to change the language syntax for this?

case statement macros that current let's try to standardize" solution overload existing language syntax needlessly.

That doesn't mean much beyond the fact that people avoid dependencies for various reasons.

True, which means that if there were to be any serious talk about having a "standard" solution for pattern matching, it would have to be in the standard library or in the language, otherwise why even bother, someone can write just another library.

At some point, people start having ideas that a particular feature would be very useful if it was readily accessible and try to promote its addition and more widespread use. Given the existing track record (writing library makes it half-dead on arrival) and belief that addition of such feature to the language would be a net benefit, it makes sense to suggest addition to the core language.

So what does this RFC accomplish that e.g. fusion/matching does not?

Better syntax that seems to be more in line with what nim itself looks like
Ability to see the shortcomings of earlier implementation (awkward @thing syntaxes as one of them)
Suggestion to add new way of defining variant types (reasoning elaborated on the main text)
Suggestion to add into the core language -- as a feature that seems to be wanted by quite a few people and implemented in the main repository instead of the same "wtf is this" fusion aka 4 commits since 2021 is likely to see more attention (especially considering what sort of deprecated stuff has been laying in the compiler for years before finally being addressed at 2.0 -- pattern matching is evidently a new, very useful feature that people will use once available)

arnetheduck commented 1 year ago

In the interest of shipping this feature, I would strongly suggest that this RFC focuses on the abstract data type definition which loosely can be described as a variant object with exactly one case in it.

I would go on to leave pattern matching for a completely separate RFC and treat / discuss it orthogonally.

The reason for this is simply that the two affect completely different parts of the language - the former enriches the type system with a new construct while the latter focuses on syntactic convenience now made possible - you can imagine for example that we might decide on shipping the data types in the language and relegating matching to libraries / macros (in std lib or not doesn't really matter - as can be seen in any modern language with a good package manager, people have 0 problems with using dependencies in general as long as the barrier of entry is low - this is mostly a cultural problem local to languages that lack good dependency management integration).

omentic commented 1 year ago

I have updated the RFC: adding a section on matching sets + tables, and adding a section on an equivalent to Rust's if let for easy unwrapping when you don't want to pattern match. Thanks @elcritch for the syntax suggestion.

omentic commented 1 year ago

@arnetheduck As far as implementation goes, I agree: the work will definitely largely separate into union types + unwrapping vs. pattern matching for structural types. But union types are useless without some form of structural pattern matching: even if this isn't comprehensive to begin with, full pattern matching should probably be discussed here to avoid creating conflicts in the implementation in the future.

konsumlamm commented 1 year ago

I'd be wary of special pattern matching syntax for tables. Afaik, currently, there is no builtin syntax for tables (the Table type is not builtin, while set and seq are).

juancarlospaco commented 1 year ago

fact that people avoid dependencies

it would have to be in the standard library

Ive seen code " include'ing by path " the pattern matching lib from the compiler tests/ folders. 🤣

elcritch commented 1 year ago

But union types are useless without some form of structural pattern matching: even if this isn't comprehensive to begin with, full pattern matching should probably be discussed here to avoid creating conflicts in the implementation in the future.

I'd generally agree. Without pattern matching sum types would only provide convenient constructors which macros can provide. I'd still think it'd be great to have but would feel lacking. Really the implementation could ship in multiple parts, but having even a basic pattern matching syntax agreed upon would be useful. Maybe skip where clause semantics for a part 2.

Importantly having pattern matching syntax in the language would prevent splintering and requiring folks to learn multiple different macro based versions. I already didn't want to mix Fungus and Patty in a project as they have different semantics.

Also I don't believe it's possible to do the if shape as Rectangle(x, y) without compiler changes IIRC. ElegantBeef and I were trying some hacks, but it was late night.

BTW, here's another possible syntax:

type
  Tree1 = ref enumeration
    of StringLeaf:
      str: string
    of IntLeaf:
      x: int
      y: int
    of Parent:
      kids: seq[Tree]

  Tree2 = ref enumeration
    of StringLeaf: string
    of IntLeaf: tuple[int, int]
    of Parent: seq[Tree]

The downside is that it's slightly confusing at first why there'd be an enum and enumeration that'd be easy to explain: simple enum, complex enumeration. Personally I like having both a simple enum and a complex enumeration.

haxscramper commented 1 year ago

@konsumlamm

I'd be wary of special pattern matching syntax for tables.

It should be for contains() and [] operator obviously, not for Table type.

Aka {6: x} is it.contains(6) and x = it[6]

arnetheduck commented 1 year ago

are useless without some form of structural

I think my point is more that pattern matching should take the whole language into consideration and not only union types (ditto union types themselves). One can easily imagine pattern-matching over other things than union types, including value matching / conditionals / etc and treating them here in the same RFC not only risks blocking union types on bikeshedding of pattern matching but indeed risks not giving due attention to each of them in the context of the rest of the language.

I don't disagree that it's good to keep both things in mind during initial design, but union types are useful on their own (because of the expressivity they add to the type system) and pattern matching isn't limited to them.

Araq commented 1 year ago

I think my point is more that pattern matching should take the whole language into consideration and not only union types (ditto union types themselves).

If only we had capable language designers who could have anticipated that... ;-)

elcritch commented 1 year ago

Good points. I wonder if there's a way to enable just a simple x as Some(y) type syntax using an overloaded operator? With that and case macros one could easily build most (all?) of the pattern matching bits as a library. If that's possible as an operator? Maybe a tuple override.

Then it'd just be settling on a union / sum type / enumeration / maybe-an-adt syntax. That should be easy. ;)

Then we could get sample PRs to get the ball rolling.

haxscramper commented 1 year ago

If only we had capable language designers who could have anticipated that...

That just allows you to write macro that is syntactically overlapping with case statement ... so basically the link should point to the macro section actually

haxscramper commented 1 year ago

@j-james I think aside from comparison with hmatching on features you can also ask for feedback on forum/discord like what sucks?, What you don't like, what could be added, what you didn't understand how to use, what feels off -- just some quick reactions

omentic commented 1 year ago

Yes, that kind of feedback is very welcome (along with all other feedback)! I've posted the link to this a few times but have not explicitly asked people to go through and evaluate it yet, will do so in the morning.

metagn commented 1 year ago

Does a language implementation of something like https://github.com/alaviss/union/ or like Crystal's runtime union types (but not implicit or merged with union typeclasses) solve the problems listed in the issue?

type
  A = object
    x, y: int
  Foo = union(int, A)

# under the hood an object variant, but structural and commutative
assert Foo is union(A, int)

proc `$`(f: Foo): string =
  if f of int: # reuse object inheritance check syntax
    result = $int(f)
  elif f of A:
    let f = A(f)
    result = "A(" & $f.x & ", " & $f.y & ")"
  # or as case statement
  case f
  of int: $f # automatic type narrowing? doesn't really make sense but not essential
  of A: "A(" & $f.x & ", " & $f.y & ")"
  # cases exhausted

var f: Foo
# A is convertible to Foo
f = A(x: 1, y: 2)
assert $f == "A(1, 2)"
# so is int
f = 456
assert $f == "456"

# recursion and distinct types
type
  StringLeaf = distinct string
  IntLeaf = distinct int
  ParentTree = distinct seq[Tree]
  Tree = ref union(StringLeaf, IntLeaf, ParentTree)
let tree = Tree(ParentTree("abc"))

(I expect recursion here to work because the following code works, but it might not be reliable)

```nim type Bar = int | ref Foo Foo = float | Bar assert int is Foo assert (ref int) is Foo assert (ref ref int) is Foo assert float is Foo assert (ref float) is Foo assert (ref ref float) is Foo assert string isnot Foo assert (ref string) isnot Foo ```

This way each branch is tied to a type, construction and destructuring are much simpler to implement (and not ambiguous with existing case statements like every pattern matching solution described above), you don't need to worry about field names or whatever, you can even do things like A of Foo that allow for more introspection.

Is this sound, or is there maybe a problem it doesn't solve?

Araq commented 1 year ago

That just allows you to write macro that is syntactically overlapping with case statement ...

I know you don't appreciate the feature but I don't know why that is. It unifies the potential different pattern matching DSLs and is as beneficial as having a unified syntax for assignment and equality for generic code.

haxscramper commented 1 year ago

unifies the potential different pattern matching DSLs

It only unifies a single keyword, everything else is up to the macro author to write, so it can be hardly called 'unifying'.

Assignment and equality seem to have different syntax: = vs == and they don't allow the user to put on complex custom DSL in the body.

haxscramper commented 1 year ago

In order for the unification claim to hold true we must ensure that every DSL implemented with case statement follows the same patter and largely has the same semantics, which more or less implies some standardized approach for this, in which case it does not make any sense to aim for several different macros that are "unified" yet different somehow, as opposed to a single standard solution.

Araq commented 1 year ago

Let's assume I have a custom type with two fields that I want to be able to match against an array-like pattern:


type
  Pair = object
    x, y: int
var p: Pair

macro `case`(p: Pair; u: untyped) = ...

case p
of [x, _]:

Now if I happen to use something more complex than int inside Pair I can call the case macro for type T inside my case macro for type Pair. That's the power of a unified interface.

Just like you can lift == for seq[T] as long as T has a ==.

nim-lang / RFCs