Open Araq opened 10 months ago
Some notes with regards to the implementation (feel free to ignore if not interested):
This design is about as simple as it gets for the parser (parse case
differently in type contexts), type graph (other proposals expected of Some(T)
, of Some: T
, of Some: field: T
to all work), and the initial deconstruction frontend but as mentioned in https://github.com/nim-lang/RFCs/issues/527#issuecomment-1860602260 there is some work to do for the construction frontend.
None
, BinaryOpr
etc. would probably best be a new symbol kind skCase
that can also undergo generic overloading for the compiler to deal with it in call syntax. We could implement this from scratch but sigmatch
has the logic for it for routines, meaning we should probably do the refactor that generalizes sigmatch
to include stuff like type conversions. Also sigmatch
could take the expected return type into account, i.e.:
type
Option[T] = case
of None: discard
of Some: T
List[T] = case
of None: discard
of Cons: (T, ref List[T])
let x: Option[int] = None()
should work.
The construction syntax I'm interested in to get comparable ergonomics of use with other languages (for nim-result
):
let x: Either[int, string] = Le(42)
ie Le(42)
does not need to know about string
.
Simple pattern matching
The syntax
of Branch as x
can be used to unpack the sum type tox
.proc traverse(n: ref Node) = case n[] of BinaryOpr as x: traverse x.a traverse x.b # ...
of Branch as variable
is sugar forof Branch(let variable)
.of Branch(var variable)
is also available allowing mutations tovariable
to write through to the underlying enum object.
Given that of Branch as x
is available, any thoughts on of Branch as var x
as sugar for the other simple form?
Given that of Branch as x is available, any thoughts on of Branch as var x as sugar for the other simple form?
Maybe even Branch as x
should not be done and only Branch(let x)
should be available. We don't have a good history of providing more than one syntax and letting people choose, it always ends up in mild fights in the style guides.
Not applicable anymore as there is an inherent difference between as x
and var x
, the x
has a different type.
The construction syntax I'm interested in to get comparable ergonomics of use with other languages (for
nim-result
):let x: Either[int, string] = Le(42)
ie
Le(42)
does not need to know aboutstring
.
The devel compiler has {.experimental: "inferGenericTypes".}
which lets you do that for functions already, hopefully a similar mechanism can be applied here.
I think this will be better for pattern matching
case n
of BinaryOpr(a: var le, b: UnaryOpr(a: let ri)) if le == ri:
le = ... # can write-through
It's more like an object constructor and it allows to skip sth like:
case n
of BinaryOpr(b: UnaryOpr(a: let ri)):
echo ri # I dont care about left operand in BinaryOpr
@ASVIEST "don't cares" should be written as _
. I think the syntax without field:
is preferable as it's shorter and still allows for everything.
@ASVIEST "don't cares" should be written as
_
. I think the syntax withoutfield:
is preferable as it's shorter and still allows for everything.
this may be inconvenient for objects with a fairly large number of fields. Maybe we can allow both of these syntaxes like https://github.com/nim-lang/RFCs/issues/517 (or maybe better https://github.com/nim-lang/RFCs/issues/418) ?
type
SampleChunk = object
chunkID: uint32
size: uint32
manufacturer: uint32
product: uint32
samplePeriod: uint32
unityNote: uint32
pitchFraction: uint32
smpteFmt: uint32
smpteOffset: uint32
loopCnt: uint32
dataSize: uint32
sampleLoops: seq[SampleLoop]
data: seq[byte]
SampleLoop = object
id: uint32
typ: uint32
start: uint32
endBlock: uint32
frac: uint32
playbacks: uint32
WaveChunk = case
of Sample: SampleChunk
...
var x = Sample()
case x
of Sample(_, _, _, _, _, _, _, _, _, _, let dataSize, _, _):
echo dataSize
else: discard
# vs
case x
of Sample(dataSize: let dataSize):
echo dataSize
else: discard
allowing both syntaxes it will be convenient both for objects with a large number of fields and for objects with a small number BTW, apparently I’m not the only one who thought that fieldX: let x was also needed: https://github.com/nim-lang/RFCs/issues/537#issuecomment-1752100391
It's just:
case x
of Sample(let x):
echo x.dataSize
else: discard
To select a single field there is no reason to unpack stuff. Keep things simple.
It's just:
case x of Sample(let x): echo x.dataSize else: discard
To select a single field there is no reason to unpack stuff. Keep things simple.
It means that in
case n
of BinaryOpr(var a, UnaryOpr(let b)) if a == b:
a = ... # can write-through
let b is UnaryOpr and not ref Node ? var a is ref Node what ?
Maybe you mean this ?
case x
of Sample():
echo x.dataSize
else: discard
Sorry, I made a mistake. Please re-read the proposal. :-/
I mean this:
case x
of Sample as y:
echo y.dataSize
else: discard
Positional pattern matching should only be allowed for tuple and enumeration types, as these types already have prominent features that rely on the ordering of their fields. In contrast, object types do not[^1]. Allowing positional pattern matching for object types would result in their public field ordering forming part of their API; developers would not be able to move fields around, nor insert new fields before existing fields, nor remove existing fields not at the end of the object, without introducing compatibility-breaking changes.
Considering that only a small subset of objects have a conceptually "natural" field ordering, introducing such behavior doesn't make sense. For objects that do have a conceptually significant field ordering, a tuple type is more appropriate.
[^1]: Though they do have minor features that rely on field ordering.
Positional pattern matching should only be allowed for tuple and enumeration types, as these types already have prominent features that rely on the ordering of their fields. In contrast, object types do not
In general, I agree, but I think that instead of the complete absence of a positional pattern matching for objects, it should be left for objects with a new pragma {.positional.}
Can we focus on the question of whether Branch as x
vs Branch(let x)
with distinct meanings is a good design? Can it be avoided? Did anybody even notice?
They could be merged into (assuming Branch(let x)
means "unpack the first field of the branch type to x" and not "unpack the full branch to x"):
of Branch as BranchType(let x):
of Branch as (let x): # for anonymous objects/tuples, maybe requires trailing comma
of Branch as (let x, let y):
of Branch as let (x, y):
of Branch as (fieldName: let x):
# if we don't care to make bindings explicit, we can always assume identifiers to be `let`/`var`
of Branch as BranchType(x):
of Branch as (x):
of Branch as (x, y):
of Branch as var x: # makes this possible for what it's worth
of Branch as let x: # and this
# this would be a reuse of a potential mechanism like:
BranchType(let x) = y
let BranchType(x) = y # x assumed to be let
let (x) = y # language already allows this to unpack unary tuples
(let x, let y) = z
If all of these are ugly, I don't see how Branch(let x)
as sugar for Branch as BranchType(let x)
is unsound.
I don't understand your idea.
Personally, I would prefer something closer to Rust:
type
Option[T] = case
of None()
of Some(T)
Either[A, B] = case
of Le(A)
of Ri(T)
Node = case
of BinaryOpr:
a, b: ref Node
of UnaryOpr:
a: ref Node
of Variable(string)
of Value(int)
At least for the tuple variants, how the definition looks would match how the destructuring looks. Most of the time, you don't need the variants as separate objects, so having to define them feels like boilerplate.
At least for the tuple variants, how the definition looks would match how the destructuring looks. Most of the time, you don't need the variants as separate objects, so having to define them feels like boilerplate.
I think that when sum types is mapping X -> Y Is really nice, you can store Y in different module or just object and make code mode readable and flexible, you can add logic for Y and then just use it. When you not need Y type, just use tuple:
type
Node = case
of BinaryOpr: (a, b: ref Node)
of UnaryOpr: (a: ref Node)
of Variable(string)
of Value(int)
doing it like a Rust removes flexibility without increasing the simplicity of the code. It is also possible that such sum types are simpler to implement in backends.
It also fits better into the logic of the case statement:
type
Sth = case
of A1: B1
of A2: B2
B1 is not a just field list, it's type
Regarding testing the underlying type of a sum-typed variable, requiring use of a kind
function or field as part of the case expression's statement would be more explicit (to both the reader and the compiler), and prevent ambiguity:
proc traverse(n: ref Node) =
case kind(n[])
of BinaryOpr:
...
It is also consistent with how most sum types are currently implemented (via object variants):
type Node = object
case kind: NodeKind
of nkBinaryOpr:
...
proc traverse(n: ref Node) =
case n[].kind
of nkBinaryOpr:
...
Furthermore, it would allow retrieving the "kind" of a sum-typed variable in other contexts (such as logging/debugging).
proc traverse(n: ref Node) =
echo repr(kind(n))
Regarding using sum types, couldn't the current rules for object variants be used, where a variable can be used as a tested type when the compiler can statically determine that it is that actual type?
proc traverse(n: ref Node) =
case kind(n[])
of BinaryOpr:
echo(n.left, n.right) # BinaryOpr-specific fields
of UnaryOpr:
echo(n.term) # UnaryOpr-specific field
...
Again, this would be consistent with how object variants are currently handled. I would need to defer to the compiler-devs, but I believe this might also allow re-use of existing compiler code too.
I know that the above syntax isn't "exciting" or "new", but it is consistent, reducing the amount of surprise/complexity that is needed to use sum objects - from a user's perspective, a sum type works "just like" an object variant (which it arguably is). It also makes migrating existing sum-types-via-object-variant code very easy.
Pattern matching, I feel, would be best implemented in a different proposal. I think that tuple unpacking and templates already serve to address most of the "boilerplate" that pattern matching is usually meant to address.
(If desired, I can go into more detail regarding what I feel are the benefits of the above syntax, but I figured I would write a shorter explanation first)
Regarding testing the underlying type of a sum-typed variable, requiring use of a kind function or field as part of the case expression's statement would be more explicit (to both the reader and the compiler), and prevent ambiguity:
Didn't see anyone else bringing this up but it is a good point that passing around the kind of the type as a value should still be possible. A less ambiguous name for this could be caseof
, i.e.
type Foo = case
of A: int
of B: string
let foo = B("abc")
let kind = caseof(foo)
echo kind # B
Then we could also have caseof(Foo)
as the type of the individual kind symbols, as a compiler-generated enum type (?).
whether Branch as x vs Branch(let x) with distinct meanings is a good design?
It feels like the latter is "unpacking" the former is "assignment". I lean towards those two having different meanings. It is also nice to NOT have two syntaxes to do the same thing.
Outside of pattern matching. Branch(x) is a constructor - x is the internal value I think it will be more natural to keep the Branch(let x) behave like a "deconstructor" and Branch as x behave like assignment.
Did anybody even notice?
I noticed it right away.
It's an improvement over the current object variant mechanism. I like the
case foo of bar as baz
being distinct from
case foo of bar(var baz1, let baz2)
I feel both forms have real value.
I'm still not really sure why the discrimination isn't simply done on the inner type, without adding a new name to each type in the sum. After all, it's really the type we're interested in. Even an "anonymous" type could be be used with this. It would also make the 'as' simpler, although perhaps less useful. But maybe that's a good thing?
type
BinaryNode = object
a, b: ref Node
UnaryNode = object
a: ref Node
Node = case
of BinaryNode
of UnaryNode
of OtherNode = ref Node # "anonymous" type
of string
of Name = string # "anonymous" type
of int
proc traverse(n: ref Node) =
case n[] as x # The "as x" doesn't seem as helpful this way, but it's still nice.
of BinaryNode:
traverse x.a
traverse x.b
of UnaryNode:
traverse x.a
of OtherNode:
traverse x
of string:
echo x
of Name:
echo "Hello ", x
of int:
counter += x
var myNode: Node[int] = 42
# Which is better looking? Which is easier to implement?
if myNode[int]:
echo "The answer is: ", myNode + 0
elif myNode[] is Name:
echo "Hello, " & myNode & "!"
elif myNode of string as ovaltine:
echo "The secret decoder message of the day is: ", ovaltine
Sum types, 2024 variant
There is a new type construct,
case
that gives Nim sum types, comparable to ML's, Rust's, etc.Constructing a case branch uses the branch name plus its payload in parenthesis. However,
BinaryOpr(BinaryNode(a: x, b: y))
can be shortened toBinaryOpr(a: x, b: y)
, an analogous shortcut exists for tuples.To access the attached values, pattern matching must be used. This enforces correct access at compile-time.
Access via
as
The syntax
of Branch as x
can be used to unpack the sum type tox
.of Branch as variable
is the basic form. Forof Branch: T
thevariable
has the typeT
orvar T
depending on the mutability of the expressionx
incase x
.Pattern matching
A variable of type
T
might be inconvienent so there also pattern matching combined with unpacking:These new syntaxes
of Branch as x
andof Branch(let x)
can later be naturally extended to if statements:if n of BinaryOpr as x
orif n of Some(var n)
.More complex pattern matching
Proposed syntax:
Serialization
There are two new macros that can traverse sum types:
constructCase
takes in a typeT
and an expression in order to construct a case typeT
.unpackCase
takes in a value of a case type and an expression in order to traverse the data structure.For example:
Anon object types
Later we can add more sugar so that the definition can be simplified to:
This way the simplicity is kept that every branch is tied to exactly one type which makes iteration over
case
types in a generic context much easier.