nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Multiple usages of the case discriminator for case objects #368

Open Araq opened 3 years ago

Araq commented 3 years ago

This RFC is a summary of the discussion from https://github.com/nim-lang/RFCs/issues/19

The proposal that has won is to allow repeating the case discriminator field without the : type syntax. Like so:


type
  NodeKind = enum
    foo
    bar
    bif
    baz
    bad

  Node = object
    case kind: NodeKind
    of foo, bar, baz:
      children: seq[Node]
    of bad, bif:
      dad: Node

    case kind
    of foo:
       additionalFieldForFoo: int
    of bar:
      onlyValidForBar: float
    else: discard

    case kind
    of foo, bar:
      validForFooAndBar: string
   else: discard

It's by far the simplest and most elegant extension to the existing language. It's 100% backwards compatible since this code doesn't compile yet. The ABI is clear too, the re-used field is put only once into the object, at the same offset that it used to.

To do

Things left to specify: How the AST looks like for this construct, both before sem'checking and after sem'checking.

bluenote10 commented 3 years ago

Apart from the fact I personally find this quite ugly and hard to read: I was hoping that Nim leaves the door open for having real algebraic data types eventually, like Haskell or Rust enums. For instance via Node = object {.algebraic.}, allowing a field x having different types in different branches, and field access fully type checked via type guards. I feel with this syntax choice instead of repeated fields this goes out the window.

Araq commented 3 years ago

I feel with this syntax choice instead of repeated fields this goes out the window.

Huh? Why? We can always require obj.field to be used inside a case guard, nothing changed. In fact, there is a warning for this via warning[ProveField]:on already.

bluenote10 commented 3 years ago

Huh? Why? We can always require obj.field to be used inside a case guard, nothing changed.

On usage site yes, but the grouping based approach doesn't seem to extend nicely towards heterogeneously typed fields. The type declaration would have to support something like:

type
  Node = object
    case kind: NodeKind
    of foo:
      children: seq[Node]
    of bar:
      children: seq[SealedNode]

I.e., it would require extension towards the first proposal in https://github.com/nim-lang/RFCs/issues/19 anyway. If that would be supported as well there would be two ways of writing the case where a field has the same type, either grouped or repeated, which seems weird.

In Rust I don't mind the repetition of fields in branches at all. Not only is a rare issue to have significant repetition, but the explicitness also helps to see all fields of a branch with a single glance, whereas the proposal requires actual reading to mentally assemble a foo vs a bar etc.

Araq commented 3 years ago

I think heterogeneously typed fields are so rare that it is not worth musing about. Just use different names for different things.

bluenote10 commented 3 years ago

They are not so rare in my opinion. When implementing e.g. a wrapper that is supposed to be bounded in types it makes sense to call its main data field just value or data, without having to creatively come up with a different unique name like valueArrayOfString in every branch.

I also find the feature helpful when implementing file formats that involve heterogeneous fields by the specs. Comparing implementing file formats in both Nim and Rust, I enjoyed Rust's enums a lot, because I could use the official spec field names despite incompatible types:

pub enum FileChunk {
    FooChunk { id: u16, header: FooHeaderType },
    BarChunk { id: u64, header: BarHeaderType },
    RomVersion { major: u16, minor: u16 },
    InstrumentVersion { major: u32, minor: u32 },
    // ...
}

Just use different names for different things.

They aren't necessarily different things semantically. A wrapper holds a "value", an "id" is a id, and a version has a "major" and a "minor" field, no matter what exact type is used. The elegance of algebraic data types is that fields can express whatever they want, without having to worry about conflicts or ugly type suffixes like id16 and id64.


Another example: Instruction opcodes. Take this Rust implementation of GameBoy opcodes as an example. It is straightforward from looking at the code what the individual opcodes require. This kind of representation makes it even possible to guess what they do. With grouping by fields this would become an absolute mess in my opinion. Admittedly in this specific example it would still be acceptable, because there are many unary (and few binary) opcodes, and sharing a field name would be still be relatively straightforward for them. But imagine many opcodes requires like 5 operands or so. What a headache to search more meaningful field intersections / unions and picking distinct names...

Araq commented 3 years ago

because I could use the official spec field names despite incompatible types

But only if you're lucky and the spec uses names that are not alien to your programming environment. Usually it's better to stick with the programming language's conventions.

Araq commented 3 years ago

On top of that, usually a spec is concerned with packed data and programming languages have little support for variable sized packed memory structures (yes, this includes holy low level C) so you need a DSL or some custom code anyway.

a-mr commented 3 years ago

So it won't be possible to define a shared field with another subset of tags?

E.g. to add to object in the example above:

    case kind
    of foo, bad:
      commonForFooAndBad: int
    else: discard

?

Araq commented 3 years ago

@a-mr Sure it would be possible, that's the point of this RFC... I don't understand your question.

a-mr commented 3 years ago

@Araq , you wrote your example as if all shared fields are placed in first case and second case without : type contains only non-shared fields. Hence the confusion.

Araq commented 3 years ago

Thanks, I've updated the example to make it clearer, I hope.

Araq commented 3 years ago

@timotheecour why the downvote?

timotheecour commented 3 years ago

because I much prefer your original proposal in https://github.com/nim-lang/RFCs/issues/19#issuecomment-173529871

this new RFC prompted me to go ahead and implement your original proposal, I have a PR in the works for that

rationale:

case {.allowDups.} t.typ
of routineKinds: ... # includes skTemplate
of skTemplate: ... # extra code
...
else: discard

note

i don't understand this objection though https://github.com/nim-lang/RFCs/issues/368#issuecomment-820701871 (heterogenous type fields); i don't see how this can work (FieldDefect is checked at RT, not CT). So shared fields must have same type.

bluenote10 commented 3 years ago

i don't understand this objection though #368 (comment) (heterogenous type fields); i don't see how this can work (FieldDefect is checked at RT, not CT).

My hope simply was that one day the check would become an (opt-in) CT check to make case object fully type safe, instead of relying on RT checks. I.e.:

type
  WrapperKind = enum
    Str
    Num

  # Perhaps also {.typesafe.}
  Wrapper {.algebraic.} = object
    case kind: WrapperKind
    of Str:
      value: string
    of Num:
      value: int

The "algebraic" opts in to enforce all access to this case object behind type guards only. This enables the reuse of value with heterogeneous types in different branches. Omitting the {.algebraic.} would result in a compiler error like:

Field "value" has different types in different case branches. Only {.algebraic.} case objects can have heterogeneous types.
a-mr commented 3 years ago

Extension to the proposal: allow if (in addition to case) for object variants

Rationale:

  1. avoids else: discard statements
  2. more consistent: if we have case then why not have if? Also if branches will look the same for both usage and definition of the fields

For point 1, the example in the description would be idiomatically written so:

  Node = object
    case kind: NodeKind
    of foo, bar, baz:
      children: seq[Node]
    of bad, bif:
      dad: Node

    if kind == foo:
       additionalFieldForFoo: int

    if kind == bar:
      onlyValidForBar: float

    if kind in {foo, bar}:
      validForFooAndBar: string

For point 2, if we e.g. have boolean kind (see an example here) it could be written more naturally as:

  ItemFragment = object
    isRst: bool
    if isRst:
      rst: PRstNode
    else:
      str: string

Of course, only simplest syntactic forms of ifs would be recognized:

if field == ...:
if field in {...}:
if field:   # for boolean `field`
theamarin commented 2 years ago

The example given by @bluenote10 is exactly what I tried first when I was experimenting with Nim. I think this is the least astonishing way and quite intuitive.

So I strongly vote for his suggestion:

type
  Node = object
    case kind: NodeKind
    of foo:
      children: seq[Node]
    of bar:
      children: seq[SealedNode]

I would really appreciate to see this in Nim, the sooner the better! :blush:

yuanweixin commented 1 year ago

In terms of priority, is there plan to include this in Nim 2 this year (2023)?

Araq commented 1 year ago

No, the focus shifted since then to a related, but yet-to-be-written RFC.

khaledh commented 1 year ago

@Araq What's the status on this? Any progress on the related RFC?

Araq commented 1 year ago

As I said, no and I'm not sure I like the idea anymore.