Generalize `init` argument convention with named result slots

Mogball commented 3 months ago

Review Mojo's priorities

[X] I have read the roadmap and priorities and I believe this request falls within the priorities.

What is your request?

Recently, Mojo added named result slots in the form of

fn foo() -> String as output:
    String.__init__(output, "hello!")

This allows assigning directly into the result of a function. Mojo also secretly has an init convention that only applies to the self argument of constructors,

struct Foo:
    fn __init__(inout self): # 'inout' is a lie! This is actually a different convention
        pass

The inout argument convention specifies that the argument value is initialized upon entry and must be initialized on exit. This sneaky constructor inout self is actually uninitialized on entry and must be initialized on exit! This is similar but not exactly the same as out arguments in other languages. Because Mojo is hiding things from the user, it's impossible to spell the type of a constructor:

var ctor: fn(inout self: Foo) -> None = Foo.__init__ # error!

Because Mojo is lying!

To regularize this, we should create a new init argument convention to expose this directly, thus

struct Foo:
    fn __init__(init self):
        pass

var ctor: fn(init self: Foo) -> None = Foo.__init__

At the same time, we can change the syntax of named result slots to this as well,

fn foo() -> init out: String:
    String.__init__(out, "hello!")

The only nuance here is that init != out. Because you can fieldwise initialize an init value, but not an out value.

struct Foo:
    var x: Int

fn foo():
    var value: Foo
    value.x = 1234
    use(value) # 'value' is not initialized. Must call a constructor!

fn bar(init value: Foo):
    value.x = 1234 # ok, 'whole-object' bit is already initialized on entry

This is an acceptable tradeoff to generalize the init syntax from constructors to any argument and with named result slots.

What is your motivation for this change?

We should do this!

Any other details?

No response

soraros commented 3 months ago

Finally, we can stop lying in constructor signatures!

Does this mean that we can also field initialise the named result slot?

fn f() -> out: S:
  out.data = ...  # currently not allowed

martinvuyk commented 2 months ago

Does this mean that we can also field initialise the named result slot?

nope:

The only nuance here is that init != out. Because you can fieldwise initialize an init value, but not an out value.

this would work (as far as I've understood):

fn f(init out: S) -> S:
  out.data = ...

Is there no way to somehow say that the output var is the on function entry initialized variable ? That way these two things would become synonyms essentially.

fn f(new out_var: S) -> owned out_var: # initialized on exit
  out_var = S.__init__()

fn f(init out_var: S) -> owned out_var: # initialized on entry
  out_var.data = ...

and this would become the syntax sugar

fn f() -> out_var: S: # initialized on exit
  out_var = S.__init__()

fn f() -> init out_var: S: # initialized on entry
  out_var.data = ...

I don't know if this makes any sense language wise, owned might not be the right concept (bind ?). But being able to bind the output to input might also help simplify lifetime annotations (?)

nmsmith commented 1 month ago

@martinvuyk

All of your examples include a return statement. With the proposed init convention, the idea would be that return is just sugar for initializing the result. So the following:

fn foo() -> Int:
    return 4

Would desugar to:

fn foo() -> (init output: Int):
    output = 4

Given this desugaring, I think your examples are a bit off-track.

martinvuyk commented 1 month ago

Oh yeah you're right, edited now to remove them. But the ideas still hold no?

nmsmith commented 1 month ago

I don't think so. For the following example:

fn foo() -> (init output: Int):
    output = 4

The idea is that output is an argument. The only special thing about this argument is that it's listed on the right-hand side of ->. There's no need for this syntax to desugar to anything else. Calling foo looks like this:

x = foo()

You should view x as an input to foo. It makes more sense if you replace = with <-:

x <- foo()

This is just the function signature for foo flipped back-to-front. foo accepts one argument: x.

nmsmith commented 1 month ago

@Mogball I'll share some of my thoughts on the design of this feature.

The only nuance here is that init != out. Because you can fieldwise initialize an init value

There's a downside to having init imply fieldwise initialization: this isn't something that callers need to care about. The conventions that appear in function signatures should ideally only communicate as much information as it relevant to the caller. So the "correct" public convention is really out. (Or whatever we want to call that.)

That said, I can see the counter-argument: it would be really nice if __init__ wasn't actually a special function, and instead its behaviour can be entirely explained by its signature.

That might be a good enough reason to have init behave as you propose.

Thoughts on the keyword

Somebody on Discord proposed new, and I think that makes a bit more sense, because:

new has a lot of precedent as a keyword in other languages.
"new" is an actual English word, so it makes for a cleaner keyword.
This will help prevent programmers from conflating the new convention with the __init__ method itself. (We don't want people to think these two features are tied to each other.)
This will help us avoid complaints about the redundancy of having to write init twice when defining a constructor.

We probably want a reverse convention as well, e.g. fin. That's what __del__ and __moveinit__ would use. And I imagine we'd still have owned, which doesn't support fieldwise deinitialization.

It would be good if we can find a pair of keywords that complement each other, such as:

init and deinit
assign and deassign
construct and destruct (these keywords are the most accurate!)
create and destroy
set and unset or reset (but we can't actually use these keywords, because set is a Python data type)

Also, verbs make slightly more sense than adjectives, since they directly express that the function is going to "take an action" on the argument. That's a point in favour of init/assign/construct/etc, and a point against new. I also think inout should be renamed to mut, and borrowed should be renamed to read, or a synonym.

nmsmith commented 1 month ago

The keywords construct and destruct would lead to a very simple definition for "constructors" and "destructors" in Mojo:

A constructor is any function that uses the construct convention.
A destructor is any function that uses the destruct convention.

Some specific examples:

fn __init__(construct self) is a constructor
fn __del__(destruct self) is a destructor
fn __moveinit__(construct self, destruct existing: Self) is both a constructor and a destructor

A minor quirk: if -> T desugars to -> construct result: T then technically every function that returns a value would be a constructor, but that's just a technicality I suppose. 🤷‍♂️

martinvuyk commented 1 month ago

I don't think so. ... The idea is that output is an argument. The only special thing about this argument is that it's listed on the right-hand side of ->.

Correct me if I'm wrong (I'm an absolute layman in compiler stuff) but afaik Python also includes the concept of new for classes where the object is created on exit/inside the func. If we introduce an init/construct keyword then to me it is very similar to say we'd need to introduce a new/create keyword where these would be the 2 versions:

fn f1() -> out_var: S: # initialized on exit
  out_var = S.__init__()

fn f2() -> new out_var: S: # initialized on exit
  out_var = S.__init__()

fn f3() -> init out_var: S: # initialized on entry
  out_var.data = ...

where f1() and f2() are considered as having the same signature

Some specific examples:

fn init(construct self) is a constructor fn del(destruct self) is a destructor fn moveinit(construct self, destruct existing: Self) is both a constructor and a destructor

As for these, I personally like having a pair of keywords like create & consume and letting construct hang in the void alone since it is a bit special. So there are 3 easy to explain keyword concepts: create & consume, owned & borrowed, and construct. Where any argument with the keyword construct makes the compiler execute the type's __new__() -> create self: Self dunder method or uses some default initialization logic (?)

Sidenote: just as a disclaimer, I'm throwing ideas at the wall and seeing what sticks. What I'm saying might not make much sense on the details since I'm not a compiler engineer. But I like language simplicity and intuitiveness, so that's why I give my input.

nmsmith commented 1 month ago

I think you're getting a few things mixed up.

Firstly, while Python classes have both __new__ and __init__, these methods aren't "argument conventions", they're just ordinary methods that are responsible for executing various steps in the object-creation process. Therefore, just because Python has two such methods, doesn't mean Mojo needs two corresponding argument conventions.

More broadly, irrespective of the above methods, Mojo structs are very different to Python classes, so not many concepts that are relevant to classes carry over to structs. There is no evidence that Mojo structs need anything that resembles __new__.

I personally like having a pair of keywords like create & consume and letting construct hang in the void since it is a bit special

The destruct convention that I'm describing is also a "bit special": it's the exact reverse of construct. construct begins a value's life, and requires its fields to be initialized. destruct ends a value's life, and requires its fields to be deinitialized.

lattner commented 1 month ago

FWIW, my current opinion on naming is to move these things to nouns, today we have:

struct S:
   fn __init__(inout self):
   fn a(self):
   fn b(borrowed self):  # same as 'a'
   fn c(inout self):
   fn d(ref [_] self):
   fn e(owned self):
fn f() -> String as output:
  output = String()

I think we should move to:

struct S:
   fn __init__(out self):  # output
   fn a(self):
   fn b(ref self):  # same as 'a', immutable reference
   fn c(mutref self):
   fn d(ref [_] self):
   fn e(owned self):
fn f(out output: String): #perhaps also fn f() -> (out output: String):
   output = String()

The rationale here is that we'll eventually allow local references as well:

  mylist = List[Int](...)
  mutref r = mylist[1]
  r += 4

Arguably we could rename owned to var since they are the same thing, but I think that would be confusing.

nmsmith commented 1 month ago

Before we bother repainting the keywords for "references", I would like to publish my bombshell proposal to completely replace lifetimes with something else. 😇

lattner commented 1 month ago

Please move quickly!

modularml / mojo