nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

compiler support for object construction shorthand (full fields initializer) #517

Open ringabout opened 1 year ago

ringabout commented 1 year ago

Motivation

The Nim language supports named field names with values to construct an object. It works perfectly fine and is unambiguous, since it's different from function calls or type conversions syntactically.

type
  Vector = object
    x, y, z: int

var x = Vector(x: 1, y: 2, z: 3)

However, compared to function calls which support both named and positional parameters, this way of construction seems not to be succinct and looks redundant. Indeed, it is. The field names are not actually needed, which means we can construct the object using unnamed field values. var x = Vector(1, 2, 3) just works. Why not add a shorthand for the object construction and call it a day?

Description

This RFC proposes a new way of object construction, which makes it handy to construct objects. You can mix it with named field values.

type
  Vector = object
    x, y, z: int

var x = Vector(1, 2, 3)
var y = Vector(1, y: 2, z)

All the fields need to be initialized in order. It means all cases below should give a proper error message (it might be relaxed in the future).

var x = Vector(1, 2)
var y = Vector(1, z: 3)
var y = Vector(1, y: 2)
var y = Vector(1, z: 3, y: 2)

As to object variants, the discriminator must be known at the compile time because the compiler needs to get hold of the exact selected branch. In the case below, the value of flag should be a constant.

type
  Ciao = object
    id: int
    case flag: bool
    of true:
      num: int
    of false:
      done: bool
    name: string

var x = Ciao(12, true, 1, "123")

For historic reasons, an object with a single field cannot be initialized with an unnamed field value. It should always be interpreted as type conversions. The field name needs to be written explicitly in order to construct an object.

type
  Single = object
    id: int

# var s = Single(12) # which gives type mismatch errors
var s = Single(id: 12)

Backwards Compatibility

It can start from an experimental feature. It might disturb function calls with the same name as the type from other modules, which means the object construction shorthand might take precedence over function calls.

Araq commented 1 year ago

All the fields need to be initialized in order. It means all cases below should give a proper error messages (it might be relaxed in the future).

This is inconsistent with named parameters in function calls where no such restriction exists.

yglukhov commented 1 year ago

While we're at it, would it be beneficial to introduce some consistency in named parameters syntax, that is using = instead of :?

metagn commented 1 year ago

Sorry to diminish the effort that has gone into implementing this but do we really need this? What problem does this (or #418) exactly solve? Is it worth the future maintenance and extra code in the compiler and required documentation? I really cannot empathize with wanting this which is weird because the "stop adding new features" crowd seems to be missing here

ringabout commented 1 year ago

This is inconsistent with named parameters in function calls where no such restriction exists.

It is easy to add support for ommittance.

var x = Vector(1, 2)
var y = Vector(1, z: 3)
var y = Vector(1, y: 2)

If needed, I can somhow add support for arbitrary orders.

vector(1, c: 3, b: 2)
vector(c: 3, 2, a: 1)
Araq commented 1 year ago

While we're at it, would it be beneficial to introduce some consistency in named parameters syntax, that is using = instead of :?

Colons are already overused in Nim though. Object constructors could use = though...

planetis-m commented 1 year ago

Adding new special syntaxes to nim could increase its complexity. I can remember how confusing it was to grasp the dissimilarity between the for loop tuples syntax with (,) versus no parenthesis, and also figuring when pairs() was called implicitly or not. I expect that this feature could also pose a challenge for learners as it doesn't 'mix' that well with what is already there. Consequently, I think there are two options available. One option is to not add anything new to the object constructor syntax. Or to make it the same as a function call and treat them as identical.

metagn commented 1 year ago

I can remember how confusing it was to grasp the dissimilarity between the for loop tuples syntax with (,) versus no parenthesis

For what it's worth I think this is straight up a bug and there should be no difference

Edit: Did not know about this, was thinking of something else

planetis-m commented 1 year ago

For what it's worth I think this is straight up a bug and there should be no difference

Here's what I meant: https://play.nim-lang.org/#ix=4sog

The proposed object construction syntax has similar gotchas as documented at the top.

ZoomRmc commented 1 year ago

I'd like to pose a few question which this RFC doesn't yet answer to the full extent.

What problem does it solve? (Benefit)

The necessity to repeat object field names when typing object initialization by hand.

Does the solution introduce any new ways for bugs to occur? (Cost)

Whatever can't be statically analyzed by the compiler will be the source of bugs. To the same extent as with functions, declarations and usage usually have some distance between them, so relying on order without proper tooling support (such as definition tooltips) is unreliable. With tooling support (autocompletion) the issue simply disappears.

Additionally, it's a new special rule with an exception (special-casing the single field). This conflict with converters already hints at possible ambiguities.

Do we want it solved? Why?

The main argument for accepting this RFC is:

Compared to function calls … this way of construction seems not to be succinct and looks redundant.

Allow me a little digression, as it's hard to find proposals equal to this RFC, so let's look at the next closest thing it's inspired by.

  1. Why would we want to add another construction identical in its behaviour to function calls? If we look at the current state of PLs, the availability of named parameters in calls considered a beneficial and sometimes even necessary feature.

    Some widely used style guides, such as Hitchiker's Guide to Python recommend:

    When a function has more than two or three positional parameters, its signature is more difficult to remember and using keyword arguments with default values is helpful.

    Lisp and Ada have been featuring named parameters for a long time and this is one of the arguments even Stroustrup accepts, when he concurs that named arguments are a "useful bit of syntactic sugar that might make programs more readable and more robust", talking about one of the numerous proposals[1] for adding this feature to the language ("Design and evolution of C++", 1994, p.154). One of the arguments Bjarne made against named parameters, is that if names change in the declaration, it will break the code. The big difference is that this case is totally covered by the compiler! While change in the order for unnamed parameters can be caught only on type mismatch in calls. And of course, one should keep in mind in 1994 he was also worried about the cost of recompilation on such an error, which in 2022 looks a bit groundless (especially when talking about fast compilers such as Nim).

    Another argument against named arguments in function calls is that it encourages long function definitions with lots of arguments, instead of data encapsulation or other techniques. Well, this is partly true, but the alternative is that there's no named parameters and people still write functions with a dozen arguments, carefully following the right order with a help of a prayer, which makes it even worse. Also, this counter-argument doesn't really apply to objects, as their whole purpose is to bundle a bunch of data together (I don't contest sometimes you're better off reducing their complexity by making them deeper instead of wider). Splitting them into smaller objects can be beneficial still, but it's totally situation-dependent.

  2. Designated initializers appeared in C with C99 and in a more restrict form are present in C++. They are considered an improvement worth adding to the language and their use is commonly encouraged. You even see some C++ people use them as a hack simulating the named parameters in calls. This RFC proposes a move in an opposite direction.

So, to sum up:

Does the new syntax make the language more or less consistent with itself?

Well, it definitely brings objects closer to tuples and functions, whether it's a good or a bad thing. Regarding the tuples, isn't the common advice "when you have more than a couple of fields, consider using an object(|struct|class|whatever) instead of a tuple"? I always supposed the logic behind it is the requirement to be more explicit which objects force you to do. Additionally, I don't really like that positional object initializers will occupy the bit of syntax which in my opinion should be reserved for rhs-typing a tuple value (probably unrealistic to wish, but in essence reasonable from the user's PoV, maybe not PL designer's). I won't repeat my reasoning generally outlined in discussion of #418.

Why does this need to be a part of the language?

Not qualified to assess, so just the questions: As far as I know, for most of the propositions of extending the language the knee-jerk reaction is "why can't it be a macro"? Why not in this case? What are the costs of maintaining this feature?

Why is this better than alternatives?

converter bandersnatch(x: (string, int, Adj)): Bandersnatch = Bandersnatch(name: x[0], age: x[1], kind: x[2])

Three ways to init an object positionally with a converter

let snark = ("Snark", 147, Galumphing).Bandersnatch let jabberwocky = Bandersnatch(("Jabberwocky", 147, Galumphing)) let boojum = bandersnatch ("Boojum", 147, Mimsy)



## Precedence
Help me out with this one. C, C++. What else? More languages kind of have this via function call, such as Python's `__init__` or Rust's `new`, but we also can have this in Nim already so it doesn't qualify.

PS: a couple of funny fitting quotes from one of the docs below:
> "Creeping featurism. The proposal adds a minor notational convenience to the language."
> "A bad precedent for adding a feature without dire need."

__
[1] You can browse through them to see better executed arguments for using named parameters in function calls, most of which also apply to object initialization:
    - https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4172.htm
    - https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0671r2.html
    - https://www.open-std.org/jtc1/sc22/wg21/docs/papers/1992/WG21%201992/X3J16_92-0010%20WG21_N0088.pdf
Araq commented 1 year ago
  1. The implementation effort is not that high.
  2. The argument is not that "it's too much to type", the argument is that the redundancy harms readability and that is harder to mitigate via tooling.
  3. You don't have to use the shorter syntax.
  4. Other languages have a comparable syntactic shortcut and almost nobody complains.
  5. Let's be honest here: If Nim had that shortcut from day one you would have never ever raised a complaint about it either.

I honestly don't understand why you are so vehemently against it. Here are a couple of usages inside the stdlib and the Nim compiler that would become more readable afterwards.

Now:

uint64x2(hi: hi, lo: lo)
HSlice[T, U](a: a, b: b)
NimStringV2(len: len, p: p)
(ref AssertionDefect)(msg: msg, parent: nil)
StackTraceEntry(procname: it.procname, line: it.line, filename: it.filename)
StackTraceEntry(procname: procname, filename: filename, line: line)
LineInfo(filename: n.getFile, line: n.getLine, column: n.getColumn)
Timezone(name: name, zonedTimeFromTimeImpl: zonedTimeFromTimeImpl,
         zonedTimeFromAdjTimeImpl: zonedTimeFromAdjTimeImpl)
TimeInterval(nanoseconds: -ti.nanoseconds, microseconds: -ti.microseconds,
             milliseconds: -ti.milliseconds, seconds: -ti.seconds,
             minutes: -ti.minutes, hours: -ti.hours, days: -ti.days,
             weeks: -ti.weeks, months: -ti.months, years: -ti.years)
IdGenerator(module: m.itemId.module, symId: m.itemId.item, typeId: 0)
ItemId(module: x.module, item: x.symId)

After RFC:

uint64x2(hi, lo)
HSlice[T, U](a, b)
NimStringV2(len, p)
(ref AssertionDefect)(msg, parent: nil)
StackTraceEntry(it.procname, it.line, it.filename)
StackTraceEntry(procname, filename, line)
LineInfo(n.getFile, n.getLine, n.getColumn)
Timezone(name, zonedTimeFromTimeImpl, zonedTimeFromAdjTimeImpl)
TimeInterval(-ti.nanoseconds, -ti.microseconds,
             -ti.milliseconds, -ti.seconds,
             -ti.minutes, -ti.hours, -ti.days,
             -ti.weeks, -ti.months, -ti.years)
IdGenerator(m.itemId.module, m.itemId.item, typeId: 0)
ItemId(x.module, x.symId)
metagn commented 1 year ago

I agree that it's harmless but maintaining it should not take priority (I understand it might also not need maintenance). I think the strategy for this historically has just been putting it in the experimental manual but off the top of my head I can't name similar nonessential features

arnetheduck commented 1 year ago

the argument is that the redundancy harms readability and that is harder to mitigate via tooling.

This is perhaps the argument that doesn't quite hold water - ie this syntax hides an essential part of object construction: which field is being assigned to instead asking the reader of code to first look up the object definition then remember the order. "Smarter editors" is kind of a poor argument in general which relies on the rube-goldberg approach of introducing an less informative syntax then changing all editors out there to gain back the information loss.

The syntax is only redundant under a very specific condition: when the name of the argument matches the field name exactly - under all other conditions, this proposal removes information that the reader has to recall through other means - Vector(x: 1, y: 2, z: 3) is not redundant the way Vector(x: x, y: y, z: z) is.

A more pointed argument "for" this proposal would be that spelling out the field names suffers from "too much information" - ie people should reasonably learn to remember field orders and if field order in an object changes, well, too bad for all code out there.

You don't have to use the shorter syntax.

This is not a valid argument: after this proposal, changing field order breaks backwards compatibility - as a library author, you don't control how the users of your library interact with it, the language rules do. By permitting this syntax, we will see more applications break overall whenever libraries change field order: this means for example that in the standard library, we must never ever again change field order of any object.

The safer option here is to introduce the shorthand syntax for the "same name" case only - this one still allows safe casual reordering of fields in an object, and promotes a style that is good for readers of code in general, namely that of consistently naming the same thing the same way.

ZoomRmc commented 1 year ago

Readability is not about "how much" but "how clear". The After RFC example is obviously not less readable (in this specific case, where the variable names were carefully chosen), but gets into the category of code which I should be cautious about (opposite to just scan through), so there's a considerable chance I'd need to go to the declarations.

If Nim had that shortcut from day one you would have never ever raised a complaint about it either.

As if I never complained about anything Nim has. :)

Other languages have a comparable syntactic shortcut and almost nobody complains.

Other languages in this case are C, C++. People complain about almost all features of the language. Sorry I couldn't find specific examples. What are other languages?

I honestly don't understand why you are so vehemently against it

Too bad my attempt to explain it thoroughly in the previous post failed. Was it too long to be convincing?

PS: make Type(value) syntax universal for calling the default, make it overridable (excluding base types, perhaps), make type conversion a regular call of this. Then, Foo(a, b) is for tuples, Bar(a: x, b: y) if for objects.

Araq commented 1 year ago

The safer option here is to introduce the shorthand syntax for the "same name" case only - this one still allows safe casual reordering of fields in an object, and promotes a style that is good for readers of code in general, namely that of consistently naming the same thing the same way.

That conflates the object's field scope with the current scope and thus would be a weird special case in the language whereas this RFC makes object construction more similar to routine calls.

Araq commented 1 year ago

What are other languages?

Rust in particular.

ZoomRmc commented 1 year ago

What are other languages?

Rust in particular.

I've already answered that this is not true. They permit only the same-name shorthand as in #418.

struct Foo {a: u8, b: u32, c: bool}

fn  main() {
    let a = 42u8;
    let b = 90210u32;
    let c = true;

    let foo = Foo {a: 0, b: 1, c: false};    
    let bar = Foo {a, b, c};
    // let xyz = Foo {a, b, false};   // Error!
    // let baz = Foo {0, 1, false};   // Error!
}

You can check the commented lines really produce errors in the playground.

Araq commented 1 year ago

that this is not true.

Yes, but it's "similar". Would you mind Rust's solution too then? Because I would, see above.

metagn commented 1 year ago

In Rust you have to declare the struct as a tuple struct which names the fields 0, 1 and so on in order to use the tuple initialization syntax https://doc.rust-lang.org/reference/expressions/struct-expr.html

Java records/Kotlin data classes/Scala case classes allow naming these fields and allow positional constructors however the declaration syntax is still different like record Foo(int a, int b)/data class Foo(val a: Int, val b: Int)/case class Foo(a: Int, b: Int)

Again, we can require {.positional.} on the object type declaration, which will mirror this, but kind of defeats the purpose

ZoomRmc commented 1 year ago

Yes, but it's "similar". Would you mind Rust's solution too then? Because I would, see above.

I probably just wouldn't care. If you think #418 it's problematic, not having it is fine by me. This RFC is negative value, on the other hand.

In Rust you have to declare the struct as a tuple struct

Tuple structs are just Nim's tuples.

Araq commented 1 year ago

Again, we can require {.positional.} on the object type declaration, which will mirror this, but kind of defeats the purpose

Why? Sounds like a reasonable compromise.

metagn commented 1 year ago

I was going to say it just looks like a macro pragma that defines a constructor proc but there are some things that aren't possible with that like constructing different object variant branches

Also it forces tons of types to be declared as {.positional.} beforehand which is inconvenient, unless we allowed something like

type
  Foo = object
     x, y: int
  Bar {.positional.} = Foo
echo Bar(1, 2)
arnetheduck commented 1 year ago

Also it forces tons of types to be declared as {.positional.} beforehand

seems pretty reasonable for what several people have reacted to as a "dangerous" feature in general and "useful" in special cases only - ie this way, the author of the type assumes responsibility for not changing the field order and tells "consumers" of the type about it: I don't foresee it being applicable to "tons" of types outside of those that already have a well-established "natural" order (such as Vector indeed where the order of coordinates is pretty much always x, y, z).

it also doesn't prevent "future" acceptance of the more specific name-matching rule, which, although it's a bit different than usual in Nim, still could be explored separately from this RFC and alleviate some of the more "obvious" redundancy.

metagn commented 9 months ago

This has stalled for a while now, I now agree requiring {.positional.} is probably fine, and apologize for implying this feature is clutter. Something not mentioned above is since this is construction syntax, it's a template for a pattern matching case too.

Should be fine for 2.2, though there are a couple more steps we could take, like allowing tuple types instead of just objects in these constructors and in normal object constructors for named tuples.