nim-lang / RFCs

A repository for your Nim proposals.
136 stars 23 forks source link

Sum types, 2024 variant #548

Open Araq opened 10 months ago

Araq commented 10 months ago

Sum types, 2024 variant

There is a new type construct, case that gives Nim sum types, comparable to ML's, Rust's, etc.


type
  Option[T] = case
    of None: discard
    of Some: T

  Either[A, B] = case
    of Le: A
    of Ri: T

  BinaryNode = object
    a, b: ref Node
  UnaryNode = object
    a: ref Node

  Node = case
    of BinaryOpr: BinaryNode
    of UnaryOpr: UnaryNode
    of Variable: string
    of Value: int

Constructing a case branch uses the branch name plus its payload in parenthesis. However, BinaryOpr(BinaryNode(a: x, b: y)) can be shortened to BinaryOpr(a: x, b: y), an analogous shortcut exists for tuples.

To access the attached values, pattern matching must be used. This enforces correct access at compile-time.

Access via as

The syntax of Branch as x can be used to unpack the sum type to x.


proc traverse(n: ref Node) =
  case n[]
  of BinaryOpr as x:
    traverse x.a
    traverse x.b

  of UnaryOpr as x:
    traverse x.a

  of Variable as name:
    echo name
  of Value as x:
    counter += x

of Branch as variable is the basic form. For of Branch: T the variable has the type T or var T depending on the mutability of the expression x in case x.

Pattern matching

A variable of type T might be inconvienent so there also pattern matching combined with unpacking:


proc traverse(n: ref Node) =
  case n[]
  of BinaryOpr(let a, let b):
    traverse a
    traverse b

  of UnaryOpr(let a):
    traverse a

  of Variable as name:
    echo name
  of Value as x:
    counter += x

These new syntaxes of Branch as x and of Branch(let x) can later be naturally extended to if statements: if n of BinaryOpr as x or if n of Some(var n).

More complex pattern matching

Proposed syntax:


case n
of BinaryOpr(var a, UnaryOpr(let b)) if a == b:
  a = ... # can write-through
of BinaryOpr(_, let b):
  # _ can be used to ignore a value

Serialization

There are two new macros that can traverse sum types:

  1. constructCase takes in a type T and an expression in order to construct a case type T.
  2. unpackCase takes in a value of a case type and an expression in order to traverse the data structure.

For example:


type
  BinaryNode = object
    a, b: ref Node
  UnaryNode = object
    a: ref Node

  Node = case
    of BinaryOpr: BinaryNode
    of UnaryOpr: UnaryNode
    of Variable: string
    of Value: int

proc store(f: var File; x: int) = f.write x # atom
proc store(f: var File; r: ref Node) = store r[] # deref `ref`
proc store[T: object](f: var File; x: T) =
  # known Nim pattern:
  for y in fields(x): store y

proc store[T: case](f: var File; x: T) =
  unpack x, f.store, f.store
  # `unpack` is expanded into a case statement:
  # `case x[]
  # of Val as val: f.store(kind); f.store(val)`
  # ...

proc load[T: case](f: var File; x: typedesc[T]): T =
  let tmp = f.load[:IntegralType(T)]()
  result = constructCase(T, tmp, f.load)

  # constructCase is expanded into a case statement:
  # `case tmp
  # of Val: Val(f.load[:BranchType]())`
  # ...

Anon object types

Later we can add more sugar so that the definition can be simplified to:


type
  Node = case
    of BinaryOpr: object
      a, b: ref Node
    of UnaryOpr: object
      a: ref Node
    of Variable: string
    of Value: int

This way the simplicity is kept that every branch is tied to exactly one type which makes iteration over case types in a generic context much easier.

metagn commented 10 months ago

Some notes with regards to the implementation (feel free to ignore if not interested):

This design is about as simple as it gets for the parser (parse case differently in type contexts), type graph (other proposals expected of Some(T), of Some: T, of Some: field: T to all work), and the initial deconstruction frontend but as mentioned in https://github.com/nim-lang/RFCs/issues/527#issuecomment-1860602260 there is some work to do for the construction frontend.

None, BinaryOpr etc. would probably best be a new symbol kind skCase that can also undergo generic overloading for the compiler to deal with it in call syntax. We could implement this from scratch but sigmatch has the logic for it for routines, meaning we should probably do the refactor that generalizes sigmatch to include stuff like type conversions. Also sigmatch could take the expected return type into account, i.e.:

type
  Option[T] = case
    of None: discard
    of Some: T
  List[T] = case
    of None: discard
    of Cons: (T, ref List[T])

let x: Option[int] = None()

should work.

arnetheduck commented 10 months ago

The construction syntax I'm interested in to get comparable ergonomics of use with other languages (for nim-result):

let x: Either[int, string] = Le(42)

ie Le(42) does not need to know about string.

ee7 commented 10 months ago

Simple pattern matching

The syntax of Branch as x can be used to unpack the sum type to x.

proc traverse(n: ref Node) =
  case n[]
  of BinaryOpr as x:
    traverse x.a
    traverse x.b
  # ...

of Branch as variable is sugar for of Branch(let variable). of Branch(var variable) is also available allowing mutations to variable to write through to the underlying enum object.

Given that of Branch as x is available, any thoughts on of Branch as var x as sugar for the other simple form?

Araq commented 10 months ago

Given that of Branch as x is available, any thoughts on of Branch as var x as sugar for the other simple form?

Maybe even Branch as x should not be done and only Branch(let x) should be available. We don't have a good history of providing more than one syntax and letting people choose, it always ends up in mild fights in the style guides.

Not applicable anymore as there is an inherent difference between as x and var x, the x has a different type.

SirOlaf commented 10 months ago

The construction syntax I'm interested in to get comparable ergonomics of use with other languages (for nim-result):

let x: Either[int, string] = Le(42)

ie Le(42) does not need to know about string.

The devel compiler has {.experimental: "inferGenericTypes".} which lets you do that for functions already, hopefully a similar mechanism can be applied here.

ASVIEST commented 10 months ago

I think this will be better for pattern matching

case n
of BinaryOpr(a: var le, b: UnaryOpr(a: let ri)) if le == ri:
  le = ... # can write-through

It's more like an object constructor and it allows to skip sth like:

case n
of BinaryOpr(b: UnaryOpr(a: let ri)):
  echo ri # I dont care about left operand in BinaryOpr
Araq commented 10 months ago

@ASVIEST "don't cares" should be written as _. I think the syntax without field: is preferable as it's shorter and still allows for everything.

ASVIEST commented 10 months ago

@ASVIEST "don't cares" should be written as _. I think the syntax without field: is preferable as it's shorter and still allows for everything.

this may be inconvenient for objects with a fairly large number of fields. Maybe we can allow both of these syntaxes like https://github.com/nim-lang/RFCs/issues/517 (or maybe better https://github.com/nim-lang/RFCs/issues/418) ?

type
  SampleChunk = object
    chunkID: uint32
    size: uint32
    manufacturer: uint32
    product: uint32
    samplePeriod: uint32
    unityNote: uint32
    pitchFraction: uint32
    smpteFmt: uint32
    smpteOffset: uint32
    loopCnt: uint32
    dataSize: uint32
    sampleLoops: seq[SampleLoop]
    data: seq[byte]

  SampleLoop = object
    id: uint32
    typ: uint32
    start: uint32
    endBlock: uint32
    frac: uint32
    playbacks: uint32

  WaveChunk = case
    of Sample: SampleChunk
    ...

var x = Sample()
case x
of Sample(_, _, _, _, _, _, _, _, _, _, let dataSize, _, _):
  echo dataSize
else: discard
# vs
case x
of Sample(dataSize: let dataSize):
  echo dataSize
else: discard

allowing both syntaxes it will be convenient both for objects with a large number of fields and for objects with a small number BTW, apparently I’m not the only one who thought that fieldX: let x was also needed: https://github.com/nim-lang/RFCs/issues/537#issuecomment-1752100391

Araq commented 10 months ago

It's just:


case x
of Sample(let x):
  echo x.dataSize
else: discard

To select a single field there is no reason to unpack stuff. Keep things simple.

ASVIEST commented 10 months ago

It's just:

case x
of Sample(let x):
  echo x.dataSize
else: discard

To select a single field there is no reason to unpack stuff. Keep things simple.

It means that in

case n
of BinaryOpr(var a, UnaryOpr(let b)) if a == b:
  a = ... # can write-through

let b is UnaryOpr and not ref Node ? var a is ref Node what ?

ASVIEST commented 10 months ago

Maybe you mean this ?

case x
of Sample():
  echo x.dataSize
else: discard
Araq commented 10 months ago

Sorry, I made a mistake. Please re-read the proposal. :-/

Araq commented 10 months ago

I mean this:

case x
of Sample as y:
  echo y.dataSize
else: discard
Varriount commented 10 months ago

Positional pattern matching should only be allowed for tuple and enumeration types, as these types already have prominent features that rely on the ordering of their fields. In contrast, object types do not[^1]. Allowing positional pattern matching for object types would result in their public field ordering forming part of their API; developers would not be able to move fields around, nor insert new fields before existing fields, nor remove existing fields not at the end of the object, without introducing compatibility-breaking changes.

Considering that only a small subset of objects have a conceptually "natural" field ordering, introducing such behavior doesn't make sense. For objects that do have a conceptually significant field ordering, a tuple type is more appropriate.

[^1]: Though they do have minor features that rely on field ordering.

ASVIEST commented 10 months ago

Positional pattern matching should only be allowed for tuple and enumeration types, as these types already have prominent features that rely on the ordering of their fields. In contrast, object types do not

In general, I agree, but I think that instead of the complete absence of a positional pattern matching for objects, it should be left for objects with a new pragma {.positional.}

Araq commented 10 months ago

Can we focus on the question of whether Branch as x vs Branch(let x) with distinct meanings is a good design? Can it be avoided? Did anybody even notice?

metagn commented 10 months ago

They could be merged into (assuming Branch(let x) means "unpack the first field of the branch type to x" and not "unpack the full branch to x"):

of Branch as BranchType(let x):
of Branch as (let x): # for anonymous objects/tuples, maybe requires trailing comma
of Branch as (let x, let y):
of Branch as let (x, y):
of Branch as (fieldName: let x):
# if we don't care to make bindings explicit, we can always assume identifiers to be `let`/`var`
of Branch as BranchType(x):
of Branch as (x):
of Branch as (x, y):
of Branch as var x: # makes this possible for what it's worth
of Branch as let x: # and this

# this would be a reuse of a potential mechanism like:
BranchType(let x) = y
let BranchType(x) = y # x assumed to be let
let (x) = y # language already allows this to unpack unary tuples
(let x, let y) = z

If all of these are ugly, I don't see how Branch(let x) as sugar for Branch as BranchType(let x) is unsound.

Araq commented 10 months ago

I don't understand your idea.

konsumlamm commented 10 months ago

Personally, I would prefer something closer to Rust:

type
  Option[T] = case
    of None()
    of Some(T)

  Either[A, B] = case
    of Le(A)
    of Ri(T)

  Node = case
    of BinaryOpr:
      a, b: ref Node
    of UnaryOpr:
      a: ref Node
    of Variable(string)
    of Value(int)

At least for the tuple variants, how the definition looks would match how the destructuring looks. Most of the time, you don't need the variants as separate objects, so having to define them feels like boilerplate.

ASVIEST commented 10 months ago

At least for the tuple variants, how the definition looks would match how the destructuring looks. Most of the time, you don't need the variants as separate objects, so having to define them feels like boilerplate.

I think that when sum types is mapping X -> Y Is really nice, you can store Y in different module or just object and make code mode readable and flexible, you can add logic for Y and then just use it. When you not need Y type, just use tuple:

type
  Node = case
    of BinaryOpr: (a, b: ref Node)
    of UnaryOpr: (a: ref Node)
    of Variable(string)
    of Value(int)

doing it like a Rust removes flexibility without increasing the simplicity of the code. It is also possible that such sum types are simpler to implement in backends.

It also fits better into the logic of the case statement:

type
  Sth = case
    of A1: B1
    of A2: B2

B1 is not a just field list, it's type

Varriount commented 10 months ago

Regarding testing the underlying type of a sum-typed variable, requiring use of a kind function or field as part of the case expression's statement would be more explicit (to both the reader and the compiler), and prevent ambiguity:

proc traverse(n: ref Node) =
  case kind(n[])
  of BinaryOpr:
    ...

It is also consistent with how most sum types are currently implemented (via object variants):

type Node = object
  case kind: NodeKind
  of nkBinaryOpr:
    ...

proc traverse(n: ref Node) =
  case n[].kind
  of nkBinaryOpr:
    ...

Furthermore, it would allow retrieving the "kind" of a sum-typed variable in other contexts (such as logging/debugging).

proc traverse(n: ref Node) =
  echo repr(kind(n))



Regarding using sum types, couldn't the current rules for object variants be used, where a variable can be used as a tested type when the compiler can statically determine that it is that actual type?

proc traverse(n: ref Node) =
  case kind(n[])
  of BinaryOpr:
    echo(n.left, n.right) # BinaryOpr-specific fields
  of UnaryOpr:
    echo(n.term) # UnaryOpr-specific field
    ...

Again, this would be consistent with how object variants are currently handled. I would need to defer to the compiler-devs, but I believe this might also allow re-use of existing compiler code too.

I know that the above syntax isn't "exciting" or "new", but it is consistent, reducing the amount of surprise/complexity that is needed to use sum objects - from a user's perspective, a sum type works "just like" an object variant (which it arguably is). It also makes migrating existing sum-types-via-object-variant code very easy.

Pattern matching, I feel, would be best implemented in a different proposal. I think that tuple unpacking and templates already serve to address most of the "boilerplate" that pattern matching is usually meant to address.

(If desired, I can go into more detail regarding what I feel are the benefits of the above syntax, but I figured I would write a shorter explanation first)

metagn commented 1 month ago

Regarding testing the underlying type of a sum-typed variable, requiring use of a kind function or field as part of the case expression's statement would be more explicit (to both the reader and the compiler), and prevent ambiguity:

Didn't see anyone else bringing this up but it is a good point that passing around the kind of the type as a value should still be possible. A less ambiguous name for this could be caseof, i.e.

type Foo = case
  of A: int
  of B: string

let foo = B("abc")
let kind = caseof(foo)
echo kind # B

Then we could also have caseof(Foo) as the type of the individual kind symbols, as a compiler-generated enum type (?).

nikhilsimha commented 2 weeks ago

whether Branch as x vs Branch(let x) with distinct meanings is a good design?

It feels like the latter is "unpacking" the former is "assignment". I lean towards those two having different meanings. It is also nice to NOT have two syntaxes to do the same thing.

Outside of pattern matching. Branch(x) is a constructor - x is the internal value I think it will be more natural to keep the Branch(let x) behave like a "deconstructor" and Branch as x behave like assignment.

Did anybody even notice?

I noticed it right away.

McChuck05 commented 1 week ago

It's an improvement over the current object variant mechanism. I like the case foo of bar as baz being distinct from case foo of bar(var baz1, let baz2) I feel both forms have real value.

I'm still not really sure why the discrimination isn't simply done on the inner type, without adding a new name to each type in the sum. After all, it's really the type we're interested in. Even an "anonymous" type could be be used with this. It would also make the 'as' simpler, although perhaps less useful. But maybe that's a good thing?

type
  BinaryNode = object
    a, b: ref Node

  UnaryNode = object
    a: ref Node

  Node = case
    of BinaryNode
    of UnaryNode
    of OtherNode = ref Node  # "anonymous" type
    of string
    of Name = string  # "anonymous" type
    of int

proc traverse(n: ref Node) =
  case n[] as x      #  The "as x" doesn't seem as helpful this way, but it's still nice.
  of BinaryNode:
    traverse x.a
    traverse x.b
  of UnaryNode:
    traverse x.a
  of OtherNode:
    traverse x
  of string:
    echo x
  of Name:
    echo "Hello ", x
  of int:
    counter += x

var myNode: Node[int] = 42

#  Which is better looking?  Which is easier to implement?
if myNode[int]:  
  echo "The answer is: ", myNode + 0
elif myNode[] is Name:  
  echo "Hello, " & myNode & "!"
elif myNode of string as ovaltine:
  echo "The secret decoder message of the day is: ", ovaltine