Open metagn opened 1 year ago
So for defining a normal ADT, you first need to define a distinct
/object
for each variant? I don't see the advantage of this over directly supporting ADTs.
I already wrote all the points above but I guess more succinctly:
case
, of
mechanismsAlso "defining a normal ADT" does not have to be any different than that. It would be really simple to build a macro that defines one of these types based on ADT syntax (I used the syntax type List[T] = adt ...
above but that wouldn't work with generics):
type List[T] {.adt.} = Nil | Node[T](value: T, next: List[T])
# becomes
type
Nil = object # or Nil[T] = object
Node[T] = object
value: T
next: List[T]
List[T] = union(Nil, Node[T])
If anything "you have to define distinct/object types for each variant" is a good thing, because the information about each variant is available as an existing type. You also get to act on these as a "set of types", meaning you can break them down into their parts, or tack on new types easily.
Imagine you have a type like this:
type
FooKind = enum fooInt, fooFloat, foo32Array
Foo = object
case kind: FooKind
of fooInt:
i: int
of fooFloat:
f: float
of foo32Array:
arr: array[32, Foo]
Now imagine if you had a large seq of Foo
that only contained fooInt
and fooFloat
nodes. You wouldn't define it as seq[Foo]
because then it would use 30x as much memory than if you just considered the branches of Foo
that used int
or float
. Instead you have to do something like:
type
FooKind = enum fooInt, fooFloat, foo32Array
Foo = object
case kind: FooKind
of fooInt:
i: int
of fooFloat:
f: float
of foo32Array:
arr: array[32, Foo]
FooIntFloat = object
case kind: FooKind
of fooInt:
i: int
of fooFloat:
f: float
else: discard
proc intFloatOnly(foo: Foo): FooIntFloat =
case foo.kind
of fooInt: FooIntFloat(kind: fooInt, i: foo.i)
of fooFloat: FooIntFloat(kind: fooFloat, f: foo.f)
else:
raise newException(FieldDefect, "runtime error for non-int-float kind!")
proc backToFoo(fif: FooIntFloat): Foo =
case fif.kind
of fooInt: Foo(kind: fooInt, i: fif.i)
of fooFloat: Foo(kind: fooFloat, f: fif.f)
else: discard # unreachable
var s = newSeqOfCap[FooIntFloat](2000)
proc generateFoo(n: int): Foo =
if n mod 2 == 1:
Foo(kind: fooInt, i: n)
else:
Foo(kind: fooFloat, f: n.float)
proc consumeFoo(foo: Foo) =
echo foo
for i in 1..2000:
let foo = generateFoo(i)
s.add(intFloatOnly(foo))
for x in s:
let foo = backToFoo(x)
consumeFoo(foo)
ADTs just make the syntax for this nicer, and actually make it worse because you cannot reuse FooKind
, instead for each branch of the restricted type you have to come up with either conflicting names or names distinct from the original.
That is, unless you introduce some "restricted ADT" type like Int | Float
that is smart enough to use the smallest size and automatically generate the conversion between it and the real ADT type. Except you cannot use |
because it's used for the union typeclass. Maybe union(Int, Float)
?
Now with this proposal:
type
Int = distinct int
Float = distinct float
Array32 = distinct array[32, Foo]
Foo = union(Int, Float, Array32)
# or, assuming an `adt` macro exists
type Foo = adt Int(int) | Float(float) | Array32(array[32, Foo])
var s = newSeqOfCap[union(Int, Float)](2000)
proc generateFoo(n: int): Foo =
if n mod 2 == 1:
Int(n)
else:
Float(n.float)
proc consumeFoo(foo: Foo) =
echo foo
for i in 1..2000:
let foo = generateFoo(i)
case foo
of Int: s.add(Int(foo))
of Float: s.add(Float(foo))
# we get to deal with invalid cases at the callsite because it's much less cumbersome
else: raise newException(FieldDefect, "shouldn't happen")
for x in s:
let foo =
case x
of Int: Foo(Int(x))
of Float: Foo(Float(x))
consumeFoo(foo)
I have to be clear here though that I am not pushing the union(A, B)
syntax, it's just the one that came to mind for the purpose of demonstration.
Something this could maybe do away with to make the implementation reasonable is order-invariance, i.e. union(int, float)
is not union(float, int)
. This would keep a lot of the benefits of structural typing as well as make the potential feature of matching to enums i.e. union(fooA: int, fooB: float)
much easier to implement.
A common use case I didn't mention above would be:
type Error = distinct string
proc foo(x: int): union(int, Error) =
if x < 0:
return Error("argument was negative")
if x mod 2 == 0:
return Error("argument was odd")
result = (x - 1) div 2
let res = foo(122)
case res
of Error: echo "got error: ", string(Error(res))
of int: echo "got successful result: ", int(res)
Stuff like this would not be affected by order relevance.
Adding onto that example, this being language level means we could optimize things like Option[range[1..5]] = union(void, range[1..5])
into range[0..5]
in the backend. I think Rust does optimizations like this for enums.
order-invariance is a completely alien concept for Nim and I don't like it.
Adding onto that example, this being language level means we could optimize things like Option[range[1..5]] = union(void, range[1..5]) into range[0..5] in the backend.
There are two levels of order-invariance - I argue here that there are significant benefits to have it at the ABI level along with other freedoms such as moving fields around between objects, flattening them in other ways than current Sup
, joining "smaller" fields like bool
etc - I don't think that should extend to the language level however - ie in source code, order should remain significant.
This design fundamentally merges a runtime value (often called "tag") with a typename. This seems to have unforeseeable interactions with Nim's meta-programming:
type
U = union(Foo, Bar)
macro inspectType(t: typedesc)
inspectType Foo # valid, Foo is a type.
macro inspectValue[T](t: static T)
inspectValue Foo # also valid?
The RFC also offers no solution for pattern matching. But conversions like string(Error(res))
are a poor substitute and it's not obvious that these conversions cannot fail at runtime (or can they?).
The fact that this sum type is structural is not a huge benefit as the 2 most important structural types are easily replicated via generics: Opt[T]
and Either[X, Y]
. On the contrary a nominal type can naturally offer a couple of pragmas that influence the layout, where hidden gaps can be exploited or even ABI versions. These things are much harder to do with a structural type where it's encouraged to repeat single constructions like union(int, ErrorCode)
everywhere.
Once again, I arrive at something like:
type
Node = ref enum
of BinaryOpr:
x, y: Node
of UnaryOpr:
x: Node
of Name(string)
Option[T] = enum
of None()
of Some(T)
Either[A, B] = enum
of Le(A)
of Ri(B)
merges a runtime value with a typename
This isn't a huge frontend issue, expressions can already have type typedesc
(which is incompatible with static
), the compiler would internally understand their meaning in these contexts. Though this still has problems, for example a naive case exhaustiveness check would be O(n^2) in terms of sameType
calls. We could still have a tag value separate from the type i.e. Node = union[BinaryOp: (Node, Node), UnaryOp: Node, Name: string]
, but this wouldn't have an obvious construction/deconstruction syntax.
it's not obvious that these conversions cannot fail at runtime (or can they?)
It wouldn't be different from the current situation with object variant branch access, which I realize now pattern matching is better for.
If we don't reuse existing types, the symbols like BinaryOpr
, Some
have to behave entirely local to their parent type, i.e. the expression None()
is invalid, we have to do Option[int].None()
(or this gets inferred). This implies the entire expression pattern None()
or Some(x)
etc. is like a parametrized enum value subject to overloads (including overloads with respect to generic parameters) as opposed to being a constructor of a type None
or Some
. An implementation would have to pay attention to this.
An implementation would have to pay attention to this.
Correct, but we have been making enum symbols smarter ("overloadable") already.
Abstract
Add a structural, unordered sum type to the language, equivalent in the backend to a corresponding object variant type.
Motivation
From #525:
On top of these, I would add:
525 and many other proposals propose some version of ADTs to deal with these problems. However this still has issues:
case
syntax which is ambiguous with the existingcase
syntax which can include complex expressions in its discriminators that evaluate to valuesobject
andtuple
typesDescription
In this proposal I will use the temporary placeholder syntax of
union(A, B, ...)
to represent these sum types. I like the syntax{A, B, ...}
instead (at least as sugar) due to both 1. mirroring with(A, B, ...)
tuple syntax and 2. similarities in behavior with sets, but this syntax might be hard to make out in this post.Basically we add a new type kind to the language that has an indefinite number of children that are all nominally unique concrete types, i.e.
A
,B = distinct A
andC = distinct A
,D[T] = distinct A
can form a typeunion(A, B, C, D[A], D[B])
butA
,B = A
,C = sink A
,D[T] = A
etc can only formunion(A)
. In any caseunion(A, B)
is also equal tounion(B, A)
, meaning internally the children of the type are sorted under some scheme.The type relation between
A
andunion(A, ...)
is that they are convertible. The subtype relation as in inheritance might also work but for now this seems the most simple.In the backend,
Foo = union(A, B, ...)
(where the children are sorted) becomes equivalent to something like:For the sake of safety in the case of uninitialized values or efficiency in the case of things like
Option
we can also introduce a none/nil kind that represents no type in the union. This would be unavailable on the frontend, values of these types must always be initialized to a value of some type in the union.Construction
Construction of values of these types is as simple as a type conversion between the element type to the sum type. That is:
first transforms into the following at type-check time:
which the backend can then turn into:
Destructuring & inspection
We can very trivially reuse the existing
case
andof
frontend mechanisms to check which type they are at runtime with (I believe) zero incompatibilities. And again destructuring just becomes a type conversion.In the backend:
A limitation is that there is no good way to have the information of the exact type of the union as a value on the frontend (what would normally be
x.kind
), but we could maybe alleviate this by allowing to attach these to an enum type, i.e.union(fooA: A, fooB: B, ...)
. But then the question arises of whether these types are compatible with other union types of the same elements but with a different/no attached enum. In any case you can generate a case statement likecase x of A: fooA of B: fooB ...
but this would be less efficient than just using thekind
in the backend.Other points
A frequently mentioned use case in #525 was recursion with pointer indirection. In the current language this works in union typeclasses but not in tuple types: the manual mentions that "In order to simplify structural type checking, recursive tuples are not valid". Maybe recursive unions can just be nominal? Or the canonicalization scheme (which tuples don't have) can account for recursion.
This is not an alternative solution to pattern matching or object variants, it's just an alternate solution to ADTs for the current problems with object variants, and ADTs happen to require pattern matching while this doesn't (but this is still compatible with pattern matching). People tend to see object variants as black and white, worse or better than other representations of sum types but in practice they both have their uses especially in such a general purpose language.
type List[T] = adt Nil | Node[T](value: T, next: List[T])
This has partially been implemented in a library (https://github.com/alaviss/union/) but a library solution is not sufficient for proper use of this:
union(A, B, C)
should probably not be equal tounion(A, union(B, C))
sinceunion(B, C)
is still a concrete type; we can have an operation+
that "merges" union types, so thatA + union(B, C) + D
orunion(A, B) + union(C, D)
all give a flattenedunion(A, B, C, D)
. This implies unions have set behavior, which might be the generics limitation mentioned above; it is nontrivial and in some cases impossible to infer generic parameters from these types.Links
or
or|
in nim) which is not the goal here https://crystal-lang.org/reference/1.8/syntax_and_semantics/union_types.htmlCode Examples
Backwards Compatibility
Should be completely compatible