Open Araq opened 3 years ago
Apart from the fact I personally find this quite ugly and hard to read: I was hoping that Nim leaves the door open for having real algebraic data types eventually, like Haskell or Rust enums. For instance via Node = object {.algebraic.}
, allowing a field x
having different types in different branches, and field access fully type checked via type guards. I feel with this syntax choice instead of repeated fields this goes out the window.
I feel with this syntax choice instead of repeated fields this goes out the window.
Huh? Why? We can always require obj.field
to be used inside a case
guard, nothing changed. In fact, there is a warning for this via warning[ProveField]:on
already.
Huh? Why? We can always require obj.field to be used inside a case guard, nothing changed.
On usage site yes, but the grouping based approach doesn't seem to extend nicely towards heterogeneously typed fields. The type declaration would have to support something like:
type
Node = object
case kind: NodeKind
of foo:
children: seq[Node]
of bar:
children: seq[SealedNode]
I.e., it would require extension towards the first proposal in https://github.com/nim-lang/RFCs/issues/19 anyway. If that would be supported as well there would be two ways of writing the case where a field has the same type, either grouped or repeated, which seems weird.
In Rust I don't mind the repetition of fields in branches at all. Not only is a rare issue to have significant repetition, but the explicitness also helps to see all fields of a branch with a single glance, whereas the proposal requires actual reading to mentally assemble a foo
vs a bar
etc.
I think heterogeneously typed fields are so rare that it is not worth musing about. Just use different names for different things.
They are not so rare in my opinion. When implementing e.g. a wrapper that is supposed to be bounded in types it makes sense to call its main data field just value
or data
, without having to creatively come up with a different unique name like valueArrayOfString
in every branch.
I also find the feature helpful when implementing file formats that involve heterogeneous fields by the specs. Comparing implementing file formats in both Nim and Rust, I enjoyed Rust's enums a lot, because I could use the official spec field names despite incompatible types:
pub enum FileChunk {
FooChunk { id: u16, header: FooHeaderType },
BarChunk { id: u64, header: BarHeaderType },
RomVersion { major: u16, minor: u16 },
InstrumentVersion { major: u32, minor: u32 },
// ...
}
Just use different names for different things.
They aren't necessarily different things semantically. A wrapper holds a "value", an "id" is a id, and a version has a "major" and a "minor" field, no matter what exact type is used. The elegance of algebraic data types is that fields can express whatever they want, without having to worry about conflicts or ugly type suffixes like id16
and id64
.
Another example: Instruction opcodes. Take this Rust implementation of GameBoy opcodes as an example. It is straightforward from looking at the code what the individual opcodes require. This kind of representation makes it even possible to guess what they do. With grouping by fields this would become an absolute mess in my opinion. Admittedly in this specific example it would still be acceptable, because there are many unary (and few binary) opcodes, and sharing a field name would be still be relatively straightforward for them. But imagine many opcodes requires like 5 operands or so. What a headache to search more meaningful field intersections / unions and picking distinct names...
because I could use the official spec field names despite incompatible types
But only if you're lucky and the spec uses names that are not alien to your programming environment. Usually it's better to stick with the programming language's conventions.
On top of that, usually a spec is concerned with packed data and programming languages have little support for variable sized packed memory structures (yes, this includes holy low level C) so you need a DSL or some custom code anyway.
So it won't be possible to define a shared field with another subset of tags?
E.g. to add to object in the example above:
case kind
of foo, bad:
commonForFooAndBad: int
else: discard
?
@a-mr Sure it would be possible, that's the point of this RFC... I don't understand your question.
@Araq , you wrote your example as if all shared fields are placed in first case
and second case
without : type
contains only non-shared fields. Hence the confusion.
Thanks, I've updated the example to make it clearer, I hope.
@timotheecour why the downvote?
because I much prefer your original proposal in https://github.com/nim-lang/RFCs/issues/19#issuecomment-173529871
this new RFC prompted me to go ahead and implement your original proposal, I have a PR in the works for that
rationale:
case
statements in a similar way in future work:case {.allowDups.} t.typ
of routineKinds: ... # includes skTemplate
of skTemplate: ... # extra code
...
else: discard
i don't understand this objection though https://github.com/nim-lang/RFCs/issues/368#issuecomment-820701871 (heterogenous type fields); i don't see how this can work (FieldDefect
is checked at RT, not CT). So shared fields must have same type.
i don't understand this objection though #368 (comment) (heterogenous type fields); i don't see how this can work (FieldDefect is checked at RT, not CT).
My hope simply was that one day the check would become an (opt-in) CT check to make case object fully type safe, instead of relying on RT checks. I.e.:
type
WrapperKind = enum
Str
Num
# Perhaps also {.typesafe.}
Wrapper {.algebraic.} = object
case kind: WrapperKind
of Str:
value: string
of Num:
value: int
The "algebraic" opts in to enforce all access to this case object behind type guards only. This enables the reuse of value
with heterogeneous types in different branches. Omitting the {.algebraic.} would result in a compiler error like:
Field "value" has different types in different case branches. Only {.algebraic.} case objects can have heterogeneous types.
if
(in addition to case
) for object variantsRationale:
else: discard
statementscase
then why not have if
? Also if
branches will look the same for both usage and definition of the fieldsFor point 1, the example in the description would be idiomatically written so:
Node = object
case kind: NodeKind
of foo, bar, baz:
children: seq[Node]
of bad, bif:
dad: Node
if kind == foo:
additionalFieldForFoo: int
if kind == bar:
onlyValidForBar: float
if kind in {foo, bar}:
validForFooAndBar: string
For point 2, if we e.g. have boolean kind
(see an example here) it could be written more naturally as:
ItemFragment = object
isRst: bool
if isRst:
rst: PRstNode
else:
str: string
Of course, only simplest syntactic forms of if
s would be recognized:
if field == ...:
if field in {...}:
if field: # for boolean `field`
The example given by @bluenote10 is exactly what I tried first when I was experimenting with Nim. I think this is the least astonishing way and quite intuitive.
So I strongly vote for his suggestion:
type Node = object case kind: NodeKind of foo: children: seq[Node] of bar: children: seq[SealedNode]
I would really appreciate to see this in Nim, the sooner the better! :blush:
In terms of priority, is there plan to include this in Nim 2 this year (2023)?
No, the focus shifted since then to a related, but yet-to-be-written RFC.
@Araq What's the status on this? Any progress on the related RFC?
As I said, no and I'm not sure I like the idea anymore.
This RFC is a summary of the discussion from https://github.com/nim-lang/RFCs/issues/19
The proposal that has won is to allow repeating the case discriminator field without the
: type
syntax. Like so:It's by far the simplest and most elegant extension to the existing language. It's 100% backwards compatible since this code doesn't compile yet. The ABI is clear too, the re-used field is put only once into the object, at the same offset that it used to.
To do
Things left to specify: How the AST looks like for this construct, both before sem'checking and after sem'checking.