microsoft / verona

Research programming language for concurrent ownership
https://microsoft.github.io/verona/
MIT License
3.58k stars 165 forks source link

Lower class to class types #280

Closed rengolin closed 4 years ago

rengolin commented 4 years ago

Use the !verona.class<> type, and verona.field_read and verona.field_write operations to access fields.

Methods will be handled in a separate issue.

rengolin commented 4 years ago

WIP: https://github.com/rengolin/verona/tree/mlir-class

rengolin commented 4 years ago

Before I get to field access, I need to lower new at least for POD classes. Classes with other classes as field types could need some ABI support (struct layout, padding, packing).

rengolin commented 4 years ago

WIP: field read/write: https://github.com/rengolin/verona/tree/mlir-field

Needs #299

rengolin commented 4 years ago

After discussions during the MLIR lowering efforts, there are a number of issues with the current implementation, and the proper fix is to have two new operators: call and static-call (WIP in the mlir-clas branch).

The issues are:

  1. Types are mostly unknown at this time. Past efforts to guess them were fraught with problems because we don't want to implement type inference before lowering, but as a pass, after it.
  2. Arithmetic operations aren't really machine instructions at this point, and numeric types aren't really native types. They're classes with dynamic methods where the LHS is the object and the RHS is the argument to the call. So, a + b is the same as a.+(b).
  3. Static calls need the type to resolve the function to call. For example, two classes with method foo would need to expose their unique implementations, but a symbol table for functions would need to differentiate them apart. But with unknown types, this is impossible before type inference.
  4. Compiler generated functions (like for loop de-sugaring) can call functions (like next) that haven't been declared yet for the particular type, especially if that type is parametric or union, which will only be known after reachability analysis and reification.

So we need a set of call operators that will not fail is the type is unknown or if the symbol it calls can't yet be resolved. The standard MLIR call operation will fail under these conditions. Given that we have two clear types of calls, static and dynamic, we need to clearly different dialect operators that after type inference (and possibly reification) will lower to either standard dialect or LLVM dialect calls, whichever makes more sense at the time.

Also, following Pony's example, at the LLVM lowering time, we'll treat known numeric arithmetic (ex: U32 a + b) as native LLVM instructions (or calls to the correct intrinsics, or custom lowering). But during the MLIR lifetime, those calls will remain in the Verona dialect, and reachability analysis can use those instead of standard calls without issue.

The next PR will be to completely replace all existing special case for numeric types and move all calls to verona.call | verona.static-call. This is a large refactory but will clear up the code considerably. It also could not have been done earlier, as the dialect was still under discussion, while now, we can lower classes and know more or less where this is going.

Hopefully, that will be the last PR of this issue.

plietar commented 4 years ago

While I think having call and static_call operations is a good design, and have no particular issue with it, there is another design which is worth considering, and which I had used in the old compiler.

A static call is decomposed into two operations, verona.static and the usual verona.call operation. The first operation takes a type and encodes it into a runtime value. In the bytecode, this creates a pointer to the type’s descriptor. The result of verona.static T has type verona.static<T> (where T is probably restricted to class types).

A verona.static has all the methods of T that are static: https://github.com/microsoft/verona/blob/master/src/compiler/resolution.cc#L645

See for example the calls to Main.use in https://github.com/microsoft/verona/blob/master/testsuite/ir/compile-pass/loop/Main.test1.ir.txt.

One of the benefits of this representation is that it enables static calls on type parameters, eg X.foo(), to be implemented without monomorphization. During lowering, type parameters are transformed into extra runtime parameters that are these pointers to descriptors.

I’m happy to ignore this design completely, or consider it later in the process if/when we add non-monomorphized lowering of generics.

rengolin commented 4 years ago

That's an interesting proposal.

So, IIUC, the result of verona.static<T> will always be the same and the return value will only be a placeholder for a "global" descriptor (not an object) that only has access to static calls. Ex:

class Foo {
  static apply(a: U32) : U32 { ... }
}
main() {
  return Foo.apply(42);
}

would be:

%0 = constant 42
%1 = verona.static "Foo" : !verona.class<"Foo">
%2 = verona.call %1, %2
return %2

And, no matter how many times I call verona.static on the same class type, it always return the same global (even if with different SSA values). So in:

%1 = verona.static "Foo" : !verona.class<"Foo">
%2 = verona.call %1, %2
%3 = verona.static "Foo" : !verona.class<"Foo">
%4 = verona.call %3, %2

Both %1 and %3 refer to the same type descriptor and the calls are identical.

I'm not yet sure how this will be handled wrt. vtable lookups, etc. (do static methods have pointers in the vtable, too?), but I think it will depend on the semantics of these descriptors (in comparison to instances) on how to find the actual method to call.

I'll change the implementation to this design and see how it goes.

Thanks!

plietar commented 4 years ago

Yes, that’s pretty much it except from the fact that the result of the static operation can’t be class<C>, but something slightly different, in order to distinguish between instances of the class and the descriptor of the class.

In terms of descriptor/vtable layout, I do think we want static methods in the vtable anyway, in order to allow static method calls on values, ie. an interface has a static method, but the call is issued on a specific instance of the class that implements that interface.

interface I {
  apply();
}

foo(x: I & mut) {
  x.apply() // Dynamic dispatch to a static method
}

I would make it so that descriptors have a layout that is akin to immutable objects embedded in the .text section (or .rodata maybe), ie. they have their own “descriptor” pointer. That pointer could very well be pointing back to the descriptor itself for example. This way the semantics of call is identical regardless of whether the method is static or not: find the descriptor at a fixed offset from the receiver, load the function pointer from that descriptor, call the function.

There is one potential downside, which is that in order to make static and non-static methods “ABI equivalent”, static methods must take a useless receiver (the pointer to the descriptor), which in a naive implementation (ie. the old compiler) wastes an argument register. There are probably ways we can avoid this.

In practice however, these calls will generally be optimized and dispatched statically anyway, at least until we add the non monomorphized generics of course.

rengolin commented 4 years ago

Yes, that’s pretty much it except from the fact that the result of the static operation can’t be class<C>, but something slightly different, in order to distinguish between instances of the class and the descriptor of the class.

Right. We may need to add a new type for this, ex. !verona.descriptor<class<C>>.

In terms of descriptor/vtable layout, I do think we want static methods in the vtable anyway, in order to allow static method calls on values, ie. an interface has a static method, but the call is issued on a specific instance of the class that implements that interface.

I imagine the only thing you need the instance is to get its type and call the function via its descriptor, as the dynamic members of the instance will be unavailable to the static methods.

In practice however, these calls will generally be optimized and dispatched statically anyway, at least until we add the non monomorphized generics of course.

I'm hoping we can optimise all static calls before we get to LLVM IR, as it probably won't know the contents of the dispatch table can't be changed and won't be able to do that itself. In this case, having them in the vtable would be counter-productive.

But at the stage we're talking right now, the actual implementation is a consideration, not a driving design decision. We should be able to lower the static + call pair in a multitude of ways, anyway.