wren-lang / wren

The Wren Programming Language. Wren is a small, fast, class-based concurrent scripting language.
http://wren.io
MIT License
6.86k stars 550 forks source link

[RFC] Support argument labels as part of method signatures #1112

Open conchis opened 1 year ago

conchis commented 1 year ago

One of the things I like about Smalltalk is the ability to define keyword labels that describe method arguments. Swift has a similar concept:

func point(x: Double, y: Double) -> Point is called as point(x: 2.0, y: 3.0). You can define another point method func point(r: Double, theta: Double) -> Point and call it as point(r: 1.0, theta: 1.571).

1/ In both languages the labels x:y: or the r:theta: are part of the method signature. The two are always considered different methods. You can’t accidentally call point(r:, theta:) when you mean to call point(x:, y:).

2/ In Swift it is something you must opt-out of if you want positional arguments. It is possible to design a language where these labels are opt-in, that is you get positional arguments by default but labels if you want them.

These are different from the keyword arguments available in many languages. In Ruby, for example, you can define a method: def point(x:, y:) and call it as point(x: 2.0, y: 3.0) but it when you call point, you always call the same point method.

Here is my suggestion for Wren:

1/ Add optional argument labels that are part of the method signature, as arity is part of a signature today. This would allow the programmer to define two different methods point(x: x, y: y) and point(r: r, theta: theta) and have them mean two different things.

2/ Argument labels would be allowed only for one or more arguments at the end of the argument list. Mixing in the middle would not be allowed.

3/ Do not add default argument values. I really like Wren’s approach of using the text at the call site to determine the method to be called. My suggestion would just add an optional new thing (argument labels) to the method signature.

Here are some consequences of the addition:

1/ Existing programs would continue to run. This does not change the language for any program that has been written up to this point. Programmers who would rather not use argument labels can just ignore them.

2/ Since argument labels would be handled at compile time (as part of the method signature), this change would not have any cost in terms of interpreter speed.

3/ Argument labels could make the programmer’s intensions much clearer when reading a method call. A method point(I, j) with arity 2 does not really communicate that the meaning of the arguments, and a the programmer could accidentally reverse them or pass in the wrong values. point(r: 1.0, theta: PI/2) is easier to understand without looking up the method’s definition.

mhermier commented 1 year ago

I have mixed feelings about this change.

It would be a wonderful tool that would allow to do more natural manual type dispatch. But the same can be (more or less) achieved with naming conventions.

The main issue I can think is what is the cost in terms of line count inside the compiler, is the tradeoff really worth it and is there a hidden issue somewhere in term of functionality...

minirop commented 1 year ago

Your example is exactly what Bob used when explaining why contructors don't have a set name (like "new" or "same as their class"). I can't find it but it was something like:

class Point {
  construct new(x, y) {
  }
  construct polar(radius, theta) {
  }
}
conchis commented 1 year ago

@minirop I agree. I often write one very general constructor and several small static (class) methods for constructing an object from different inputs. The way to think about the argument labels is a more descriptive way of naming constructors and methods.

@mhermier My guess is that change could be quite small. The Smalltalk compiler, which uses this approach, is notorious for being tiny. Smalltalk basically wraps up the labels into the method name. For example the standard library method evaluate: code notifying: requestor has the name evaluate:notifying:. From my examples above, we could name two methods: point_x:y: and point_r:theta:. If the compiler knows how to do this translation it's just a matter of a name lookup.

As @ruby0x1 pointed out, it gets a little more complicated when calling out to C code. It occurs to me that we could use a similar approach to implementing methods with argument labels in C. Say C functions such as point_$x$y and point_$r$theta to implement the two methods in C with argument labels.

Having said that I have not yet read the Wren source code .. I have it on my list to do in the next couple of days.

clsource commented 1 year ago

Maybe something using maps can be used. Since arity would be 1 for both methods, maybe the map can be expanded or another technique to determine which method must be called.

class Point {
  construct new({"x": 0, "y": 1}) {
   // x and y variables not available
   // but the method would only be called if x = 0 and y = 1
  }

  construct new({"x": x, "y": y}) {
    // x and y variables are available
  }
}

Point.new({"x": 0, "y": 2}) binds to new({"x": x, "y": y})
Point.new({"x": 0, "y": 1}) binds to new({"x": 0, "y": 1})

// Maybe add some sugar too?, dont know if is possible

Point.new(x: 0, y: 2) binds to new({"x": x, "y": y})
Point.new(x: 0, y: 1) binds to new({"x": 0, "y": 1})
boucher commented 1 year ago

Using maps leaves you with something pretty close to the way JavaScript handles destructuring. It would remove any ordering and make everything optional though (and you could imagine adding in defaults at that point).

class Point {
  // these two options are one or the other, they couldn't both exist because they both have the same signature

  construct new(opts) {
   // x and y variables not available, opts would be expected to be a map
  }

  construct new({x,  y}) {
    // x and y variables are available, destructured from the map passed in
  }
}

Point.new({"x": 0, "y": 2}) 
// or with some syntactic magic (which I personally don't like)...
Point.new(x: 0, y: 2) 
mhermier commented 1 year ago

@conchis I mean as this it is only a syntax sugar, and it cost of many new line of code and add (extra) mangling to symbols (At worse, external tools can be created to lets users to determine the mangled names, so it is not the biggest deal). So unless there is a killer feature that emerge from that, it is just adding too many lines of code and is not desirable. But I think it is interesting enough to be thought deeper, to see what can emerge from that.

@clsource maps can be a solution, but can have a small problem some cases: it loose parameter ordering (and are probably harder too deal with than it should be, but I didn't thought about it deeply though).

@boucher, your options is not realistic due too how wren method dispatch is working, but the idea is interesting.

boucher commented 1 year ago

@mhermier I think it would be pretty easy to implement what I suggested, but as I mentioned in the code, you can't have multiple methods with different options, it's more something that makes passing around maps nicer to work with and sort of approximates named arguments in some use cases. If you did this, I don't think there's any reason you couldn't also support destructuring anywhere you declare a var. (And probably could do it with arrays as well)

PureFox48 commented 1 year ago

I'm not keen on this proposal for the following reasons.

One thing I've always liked about Wren is the ability to overload methods based on their arity which I think is a great feature for a simple dynamically typed language and is relatively easy for the VM to deal with.

As someone who's had to wrestle with the complexities of overloading in statically typed languages with their myriad built-in numeric types (C# specification anyone?), this was a breath of fresh air.

I would therefore take a lot of persuading that introducing another factor (namely argument labels) into the overloading mix was a good idea.

I don't see much benefit from argument labels in any case. If I'm calling a method which takes a lot of parameters and I'm worried that I might get the arguments in the wrong order, I simply arrange them vertically and add a comment to each one for checking against the method definition. Using reasonably descriptive names for the parameters also helps, of course.

So, in summary, I just don't feel the proposal has sufficient merit to justify the extra complication.

mhermier commented 1 year ago

Considering the nature of our environment, like the LL(2) parser, the lack of AST, the VM implementation... I think only the syntax in the original post is viable. But I think I would consider another way to let the compiler know we want named parameters, and reserve ':' to provide optional types (for documentation and/or type checking).

@conchis I don't doubt it can be small, what I meant is: if it can be done in a diff of less 20 line changes (to put an arbitrary limit) it would be the best, if this change explodes to 200 line changes, it is an absolute NO.

@Purefox, there is a counter argument to your second reason: If you consider using external libraries to be a thing, you might want to have the compilation stop, than discover way later that some parameter change (order or meaning) introduced a bug, not discovered because of a lack of (unit) testing. That said, I still not consider it as a critical argument, since its generalization would bloat the code (considering what I saw in videos of rust IDEs always exposing parameter names automagically).

ruby0x1 commented 1 year ago

I'll add some notes, that a constrained version of the original suggestion is a minor change in terms of what's already there.

One the user side, this is purely a clarity thing where

create(2, 5, false, 100)

becomes much easier to read at a glance. this is a very nice property and makes code nicer to read/less cognitive load.

create(x: 2, y: 5, wrap: false, width: 100)

The version like this is a tiny specialization on the signature at compile time in wren_compiler.c, the rest of the mechanics are the same and don't change (no real change to interpreter loop etc)

edit: and yes to keep in mind optional type annotations use : like func(arg: Type) but not at the CALL site, only at the definition, so it shouldn't overlap anyway

mhermier commented 1 year ago

@ruby0x1 without overload, it is a glorified comment validator. It brings a too small value to the common practice @PureFox48 talked about:

create(/* x: */ 2, /* y: */ 5, /* wrap: */ false, /* width: */ 100)
or
create(/* x: */ 2,
       /* y: */ 5,
       /* wrap: */ false,
       /* width: */ 100)

In any case, this behavior should be opt-in, else the doubling of signature will make the dictionary implode in the VM...

ruby0x1 commented 1 year ago

I'm aware, and it's not validated anyway (the call site has no information about the definition cos of dynamic dispatch, it's just positional syntax sugar)

mhermier commented 1 year ago

If we really want to go that route, there is an even simpler solution that requires only one small change: simply allowing parameters annotations as comment. It would act like a a comment, and we postpone validation of the names for later.

It means:

class Foo {
  static foo(arg) { ... }
}

Foo.foo(my_fancy_argument_description: 69)

would be valid, 'my_fancy_argument_description' would simply be ignored by the compiler for now, and would only require at most 5 extra line, I guess. (maybe a little more if we want to be stricter and check coherence that all or none arguments have parameters annotations)

But I don't think it is the sanest route to take.

conchis commented 1 year ago

My guess is that it would be a very small change in any case .. as Wren already uses arity as part of the method signature. And the more static (my original) version of this has multiple advantages:

1/ All the work is done at compile time and the generated bytecode will be absolutely as fast as the original. It does not require building a dictionary nor digging values out of a dictionary.

2/ It does not require the programmer to write code to look for what labeled arguments are available.

3/ It makes intuitive sense that point(x: 1, y: 3) is a different (conceptually related) method than point(r: r, theta: t). These do different things.

4/ The code is much cleaner than one that has to use `if's to decide on what code to execute.

This idea, making argument labels part of the method selector, was one of the better features of Smalltalk. It made code much more readable in many cases. Also the project I have in mind for Wren, an English language interface for querying and altering object models, would really benefit from this feature.

conchis commented 1 year ago

@mhermier One of the things I did not like about overloading in Java, which uses only arity and argument types, is that it not only did not provide documentation at the call site, but was easy to get wrong. The example I have been using illustrates this: point(Double x, Double y) is the same to the compiler as point(Double r, Double theta) and so Java-style overloading is really asking for trouble!

In my day job I do a lot of my back-end programming in Kotlin, which I like in general, and it supports optional and re-ordered keyword parameters. But I think its solution to this problem is inferior to to the Smalltalk/Swift approach. Methods that have very different inputs should be different methods! But it is still helpful to the reader to have some hint (through the name) that they have similar functions.

mhermier commented 1 year ago

@conchis, I think we all clearly understood the original proposition, we are brainstorming about benefits/cost/consequences and mitigation strategies.

PureFox48 commented 1 year ago

When I referred in my previous post to using comments where there's a lot of parameters, the reason I arrange them vertically is so that I can use line comments rather than block comments.

create(
    2,     // x
    5,     // y
    false, // wrap
    100    // width
)

If I arranged them horizontally instead then I'd have to use block comments which give a rather cluttered, hard to read appearance.

create( 2 /*x*/, 5 /*y*/, false /*wrap*/, 100 /*width*/)

Now, I wouldn't object to there being a more succinct way to label arguments than this but, given that we've already earmarked the use of : for optional type annotations when the method's defined (a much more important usage IMO), I think it would seem very odd to people if we were to use the same symbol for argument labels at the call site as it's not the same thing at all. Also, when I try using other symbols, they all look kinda weird though perhaps a semi-colon would be the best choice. Whatever we used, syntax highlighters would need to be changed to recognize the new symbol.

But, to get back to the original proposal, even if we had overloading based on argument labels, then I wouldn't use it to distinguish between point(x: 2, y: 3) and point(r: 1, theta: 1.571). To me creating points using cartesian and polar coordinates are different enough to warrant using different names for the constructors. Of course, if I wanted to use a library written by someone else, then I'd have no choice but to use arguments labels. If I just did point(2, 3) there would be no way for the VM to distinguish between the two constructors unless we had some rule (say, the one defined first wins) which I doubt would be workable.

If I've understood it correctly, @ruby0x1's 'half way house' suggestion would deal with the above situation though limits you to two overloads with the same arity and, frankly, I'm still not keen as it seems like a lot of extra complication to solve a problem which is easily worked around in any case.

mhermier commented 1 year ago

This change is problematic in another way, it introduce friction with the setter syntax, in many ways. Which of the following is legal/looks the correct answer ?

/* 1 */ obj.pseudo_value = v: value
/* 2 */ obj.pseudo_value = (v: value)

/* 3 */ obj[arg0, arg1] = v: value
/* 4 */ obj[arg0, arg1] = (v: value)
/* 5 */ obj[a: arg0, b: arg1] = value
/* 6 */ obj[a: arg0, b: arg1] = v: value
/* 7 */ obj[a: arg0, b: arg1] = (v: value)

I'm personally in favor of syntax 2 and 7 only. It would unify it make it depends on something I have in mind for a while which would be the introduction of any arity of the setter syntax. and would allow something like:

vector3f=() // Direct set to null
vector3f=(a, b, c)
vectors3f[i]=(d, e, f)

and make the syntax a little bit more fluent? In any cases, even if not supported at the syntax level, I think the compiler could be a little bit simplified by supporting that.

conchis commented 1 year ago

Here is a taxonomy that might help clarify the options. It boils down to three questions:

Should argument labels be part of the method signature? (That is different labels, different method.) Can labeled argument be re-ordered at the call site? Does the language support default values for arguments?

One example might be:

Argument labels are not part of the signature (but are checked by the compiler). Labeled arguments cannot be re-arranged, default values are supported.

conchis commented 1 year ago

Just for fun, here are some examples of how other languages answer these three questions:

# Ruby - Argument labels are not part of the method signature. Default values and
# reordering are supported.

class Point
  def initialize(x:, y:)
    @x = x
    @y = y
  end
end

def point(r:, theta: 0.0)
  Point.new(x: r * Math.cos(theta), y: r * Math.sin(theta))
end

puts point(r: 1.0, theta: 3.14159 / 2).inspect
// Kotlin: Argument labels are not part of the method signature. Default values and
// reordering are supported.

data class Point(val x: Double, val y: Double)

fun point(r: Double, theta: Double) =
    Point(x = r * cos(theta), y = r * sin(theta))

println(point(r = 1.0, theta = 3.14159 / 2))
// Swift - Argument labels are part of the method signature. Reordering is not
//         allowed. Default values are supported.

struct Point {
    let x: Double
    let y: Double
}

func point(r: Double, theta: Double = 0.0) -> Point {
    Point(x: r * cos(theta), y: r * sin(theta))
}

print(point(r: 1.0, theta: 3.14159 / 2))

For Pharo Smalltalk, I am just going to paste the version that is already part of the image (standard library). Argument labels are part of the method signature. Reordering is not allowed. Default values are not supported. This is a Point class method.

r: radius theta: angle
    ^ (radius * angle cos) @ (radius * angle sin)
mhermier commented 1 year ago

Should argument labels be part of the method signature? (That is different labels, different method.)

Yes, otherwise is as useful as a glorified comment, and brings not much value.

Can labeled argument be re-ordered at the call site?

No, because it would means combinatory number of signatures and the VM would implode. So it should be manually done by the user at declaration site. For that maps would be probably better suited.

Does the language support default values for arguments?

Yes and no, you simulate them the old way by varying the number of argument of overloads methods. Enabling it, I think would requires a AST so that it is practical to implement.

conchis commented 1 year ago

I should mention also that there are perfectly fine languages that do not support argument labels at all: JavaScript, C, C++ and Rust. If you want to allow users to build internal DSLs (my main reason for needing a scripting language) you can sort-of do this by designing classes to support "fluent" sequences of calls to set each argument value.

aosenkidu commented 1 year ago

I would put argument labels into the method signature. And I would greatly simplify the method declaration, by simply only adding a ":" behind the argument name. Special setters and array access operators are IMHO not needed. point(x:, y:) { var pt = Point.new() pt.x = x pt.y = y return pt }

Compiles to a method called point:x:y(_,_), and thats it. Or put the ":" in front of the parameter name, if you like that more.

mhermier commented 1 year ago

It is implementation details, and as this it does not solve calling conventions...

aosenkidu commented 1 year ago

It is implementation details, and as this it does not solve calling conventions...

No, it is a language detail. The original and some follow up proposals make from a language point of view not much sense.

e.g. point (x: x, y: y) every beginner will ask: "what nonsense is that?" Why is x repeated or y repeated?

Or do I misunderstand what you mean with calling convention? Obviously you call it like this: `object.point(x: valueForX, y: valueForY) ...

I focused on how to define the method, as the proposals are not "simple".

clsource commented 1 year ago

At least in languages such as gleam, labels are defined as:

pub fn replace(inside string, each pattern, with replacement) {
  go(string, pattern, replacement)
}

replace(each: ",", with: " ", inside: "A,B,C")
mhermier commented 1 year ago

I mean it is implementation detail, because what you propose is a syntax sugar to avoid parameter duplication.

The main issue is how labels should be treated and distinguished at call site. Should: foo.call(42) and foo.call(theAnswer: 42) call the same function or not. When an answer to that question is solved, all the implementation details will collapse. And then we can think of syntax sugar.

PureFox48 commented 1 year ago

Should: foo.call(42) and foo.call(theAnswer: 42) call the same function or not.

Well, I think it would seem odd to people if they weren't calling the same function but, under the original proposal and @ruby0x1's variation of it, they would be calling different functions and so it's yet another reason why I'm not keen on either.

I wouldn't object to argument labels if they were completely optional and not part of the method/function signature as this would be a more succinct way of adding documentation at the call site than just using comments as I do now whilst still being backwards compatible. It should also be relatively easy to implement as the compiler could simply ignore anything before the separator. There would be no need to check that all or none of the arguments were annotated in this way - that would be left to the programmer's discretion.

As I said earlier, I have reservations about using : as the separator because of possible confusion with type annotations when the method/function is declared but there's no obvious alternative. We can't use = like Kotlin does because theAnswer = 42 would be an expression in Wren which would assign 42 to the variable the answer (which might not even exist) and then pass that as the argument.

ruby0x1 commented 1 year ago

FWIW a lens I like is "it's easy to do, hard to undo".

Making the compiler ignore an identifier before an argument is simple, low friction, backward compatible and optional. it's also cohesive in terms of what's there re arity + signatures. The risk of needing to revert it is relatively low. It has the same format as any future specification or implementation in that it is "forward" compatible, it can be built upon if needed.

It's also easy to add the labels at the parameter definition, add new kinds of signatures, but would be much harder to undo if need be. The surface area it might affect (bindings, call sites, etc) is wider.

Building upward is a lot easier than walking back, and stepping stones are a valid form of progress. I'd consider this a stepping stone. I don't think there is much beyond it in the current form of Wren, because there's no way to validate the call site vs parameters (not enough info at the call site.

I also think the : having symmetry with type annotations is nice because these are argument annotations, both annotations. So I don't think it's unclear or confusing in any way, it's different contexts. In the same way . isn't confusing when shared by 3.5 and this.stuff the context counts and trivially learnable.

PureFox48 commented 1 year ago

I also think the : having symmetry with type annotations is nice because these are argument annotations, both annotations. So I don't think it's unclear or confusing in any way, it's different contexts. In the same way . isn't confusing when shared by 3.5 and this.stuff the context counts and trivially learnable.

Yeah, fair enough :)

aosenkidu commented 1 year ago

I mean it is implementation detail, because what you propose is a syntax sugar to avoid parameter duplication. The main issue is how labels should be treated and distinguished at call site. Should: foo.call(42) and foo.call(theAnswer: 42) call the same function or not. When an answer to that question is solved, all the implementation details will collapse. And then we can think of syntax sugar.

Erm, seriously? OF COURSE THEY DO NOT CALL THE SAME METHOD! The first one calls foo.call(42) - assuming you do not mean the special call() method, and the second one calls foo.call:theAnswer(42)

It boils down that: obj.call(arg1: x, arg2: y) versus obj.call(arg2:y, arg1: x) boil down to two different methods.

mhermier commented 1 year ago

Everything is a trade-off. As long as the behaviour is founded on some logic, and documented, it is not a real problem. Some decisions are proven to be more bad in practice, but may be lot more easier to implement. Like always, we evaluate damage control, of each behaviour, to see the impact kif we need to change in the future.