terralang / terra

Terra is a low-level system programming language that is embedded in and meta-programmed by the Lua programming language.
terralang.org
Other
2.72k stars 200 forks source link

How to implement an object system in Terra? #345

Open capr opened 5 years ago

capr commented 5 years ago

Hi! Me again :)

I'm trying to implement an object system on top of Terra that allows both overriding and overloading and I can't seem to find a way to do it, despite a week of struggling.

I should probably mention that my intention at first was not to write a language extension since I don't want to introduce new syntax, but I went down that road because I couldn't do what I wanted with existing metamethods. In particular, I need to control what happens when a new method is defined in order to implement auto-overloading behavior, see #336 (I also wanted metamethods for assignments in order to implement properties - see #340, but I gave up on that update: that was solved).

So by making a language extension I thought I could implement auto-overloading easily, but that added in more problems. First, you can't just create a bunch of interdependent Terra functions because the eager type-checking will bail on recursive references. You can't stub them all and add the body on a second phase with resetdefinition() either because there's no API to get the return type of a quote (#339) so you'd have to give up return-type inference. Another thing I tried was to build dependent methods on a __methodmissing hook, but there's no API to select an overloaded definition from an overloadedterrafunction object given a set of arg types (probably what tryinsertcasts does), so you wouldn't know how to return the right definition.

I also studied javalike.t and it is using an undocumented "sturct constructor" syntax which receives the newly created type at a very convenient moment when all methods are specialized but not yet compiled, which allows the entire methods table to be replaced with vtable stubs and everything just works afterwards. This is a luxury I don't have when creating functions "by hand" I suspect because of eager type-checking. An API to pause and resume compilation (i.e., enable/disable eager type checking) might help here, if I understand the issue correctly that is.

There's always the option of making a parallel language from scratch that compiles to Terra but I would end up with a second Terra implementation with semantics slightly off, and a lot of code duplication. So the idea of gradually extending Terra is much more appealing to me, but I think that more of Terra's guts would have to be exposed for that.

I'm open to suggestions on how to do this differently of course :)

OvermindDL1 commented 5 years ago

Sounds like the XY Problem to me? An object system is almost never what one actually wants except in very specific circumstances.

What are you actually trying to achieve overall? Why do you think you need overloading? Does your overloading need to be run-time or compile-time? Etc... Etc... :-)

capr commented 5 years ago

@OvermindDL1 Well I'm not sure that justifying the need for an object system or overloading is within the scope of the issue nor is the lack thereof a symptom of the XY problem :) Besides, how can one justify the need for overloading? Everyone knows what it is, what it's good for and what to do when it's not available in the language.

OvermindDL1 commented 5 years ago

At the very least the implementation significantly differ depending on use, so at least the base question of whether it needs to know the types at compile-time or run-time will significantly change what is done.

Like in a poor OOP system like Java or C++ it does runtime dispatch, but in a more powerful dispatch system like OCaml or Haskell or so then it does compile-time dispatch with witness passing for dynamic dispatch.

Hence still knowing what it is going to be used 'for' helps to craft the best answer. ^.^

capr commented 5 years ago

Overloading means dispatching based on both name and argument types at compile time. Overriding or virtual functions means dispatching at runtime based on the type of the object that was created (usually implemented as a vtable). I'm not aware of other definitions for these concepts.

The use case is rewriting this https://luapower.com/oo so I can then rewrite this https://luapower.com/ui.

OvermindDL1 commented 5 years ago

Overloading means dispatching based on both name and argument types at compile time.

Not always, like LISP has multi-method runtime overloading via log(N) dispatch (as efficient as multi-methods can potentially get without significant memory overhead). Runtime overloading can be handled via witnesses as well, think of a variant dispatcher.

Overriding and virtual functions being runtime only is a C++/Java'ism and do now need to be, like take OCaml and having one module override another, that happens at compiletime including all resolutions thereof on that, needing to pass it as an opaque time means using it as a witness thus converting it to runtime dispatch, compare to Haskell where a similar action stays compile-time even through witness passing (a 'witness' is like the this argument in C++/Java for note) because of HKT resolutions.

There are lots and lots of patterns here with lots and lots of different efficiencies, even far more so than what I just listed above (those are just the most well known aside perhaps prototypical dispatch, which can get into a whole new world of different styles depending on the language). :-)

https://luapower.com/oo

Hmm, most of the examples given are very poor uses of OOP and are more suited for ADT's, but it doesn't seem to support method overloading it all. It supports pre/post call hooks (the 'chaining' pattern) but doesn't seem to support overloading at all. I'm guessing that's a new feature that you are wanting to add in? Shouldn't be too hard to do if so as long as at the very least all types needed are known ahead of compilation type.

https://luapower.com/ui

Now this is exceptionally bad use of OOP! ^.^; If done in OOP it would be better if this followed a mixer pattern instead of a poor hierarchy tree, although something like a component style (or more efficiently an ECS) might work a lot better and be far more flexible.

But yeah, duplicating their API's shouldn't be too hard, and it doesn't look like you need overloading at all as they 'emulate' overloading via the chaining pattern, which is pretty trivial to do. and if you pass the continuation callback inside of the calls (which it doesn't do so that would be an API change) then you could have each stage control whether or not the stage after is run. In essence you could emulate overloading via O(N) dispatch with their API. If you want to change their API and support proper overloading then you'd have to support doing it at runtime since the 'methods' in that API can be overridden either entirely on-object or via child object definitions. As for type testing then you'd need to pass in a witness to dispatch on, either the C++/Java style where the witness is baked into the complex types or the explicit way via a hidden witness argument passed in (though if you go with the latter then the function type would have to be pre-declared on the base-most class that it exists on for it to be an overloadable method, where the former method means that every-single-type will be a little bit more heavy in terms of memory and allocation, only by a pointer to a static object added on each, but still the cost is non-free there compared to on-demand witness objects).

capr commented 5 years ago

Oh gawd, why is there on every place on the Internet these days a dude who hijacks threads baiting people into justifying trivial issues so that he can throw in a lecture to show off his vast encyclopedic knowledge, all the while hoping that no one notices the psychology behind this style of communication game? You never had any intention of understanding the issue or providing any relevant feedback from the start, did you? You couldn't possibly infer the obvious meanings of overriding and overloading in the context of a language like Terra, I had to define them for you so that you can tell me how LISP does it, isn't it? Oh, and Haskell, always throw in Haskell. Yeah sure, I'll do ECS this time because "it's so much better". There, I just provoked you again so you can start preparing your lecture on ECS next (it was a joke, don't do it, please, please don't do it!).

OvermindDL1 commented 5 years ago

in the context of a language like Terra

Terra is a language for simple LLVM calling in an easy environment with a variety of helpers for building transforms and languages both to compile to the LLVM backend, these are the topics that do directly relate to what Terra is for.

If it was for the expressed purpose of 'just' the Lua layer, why not use the Lua libraries as they are in that case. Just converting them to perform LLVM calls is essentially writing a language transformer, in which case all these topics are important unless efficiency does not matter whatsoever (in which case just use a lua interpreter and use the libraries directly).

The languages picked as examples were picked because of their popularity and direct usages of the specific examples of mind. I don't particularly like Haskell at all, but it was a useful example of a specific pattern. The words like override, overload, etc... change depending on the specific pattern that is chosen, of which the best one of that depends on the purpose itself.

What was curious was the speaking of overloading, which in an LLVM/Terra context refers to type-based compiletime only overloading, this form fails when those overloads are used dynamically through a function pointer, which is what an overrideable object system uses at its core, thus mapping it to the linked oo Lua library style of doing things makes no sense until a specific style of doing it is chosen first, especially as that linked oo library has no concept of overriding at all, only overloading (these being used via their definitions in an oo context and terra/llvm context respectively).

justifying trivial issues

These are not trivial issues, you are talking about writing an OOP system to compile to low level assembly via LLVM, of which there are a monumental amount of forms of doing it, like that Lua library is more of a prototype style than it is OOP.

show off his vast encyclopedic knowledge

This is nothing of the sort, this is bog-standard grunt work for writing compilers, which is what writing terra code 'is'.

You never had any intention of understanding the issue or providing any relevant feedback from the start, did you?

I very much did, hence the multiple clarification questions within, which if all where answered then it would be trivial at this point to supply the terra code in question that would handle the style wanted. As it is the linked lua OO library, other than not being OO and rather a prototype model, is dynamic enough to make a raw LLVM translation of it directly impossible without using a global variant to map all possible representable types as lua itself already does, at which point you are just inefficiently rewriting a built of luajit itself inefficiently due to lack of tracing abilities. Figuring out how the types should work is required to write efficient LLVM code.

You couldn't possibly infer the obvious meanings of overriding and overloading in the context of a language like Terra

In the context of Terra could either be in the context of Lua or the context of LLVM as both the Lua side of it and the LLVM side of it have different meanings of those words. The lua side has overriding as protototypical replacement of a binding and has only dynamic dispatch overloading, while the LLVM side of terra has overriding as function pointer rebinding and overloading as a pure compile-time construct with no build-in ability for dynamic dispatch.

I'm trying to implement an object system on top of Terra that allows both overriding and overloading and I can't seem to find a way to do it, despite a week of struggling.

This was the initial mention, this does not state whether it should be overriding and overloading at your definition layer, I.E. running entirely in Lua, or if it is at the code generation layer, I.E. running entirely in generated LLVM output.


If someone actually takes time to ask leading questions with examples to try to figure out what is actually being done so they can actually write you code that shows how to do it, it's usually polite to answer those clarification questions. As it stands based on the information given then it would be entirely reasonable to assume using it all at the Lua layer, which would make perfect sense if you want to, for example, make a UI that lets users use it to define whatever that will eventually generate LLVM output. It would also be reasonable to assume you want to write a UI interface that is used in the LLVM output so a user could, for example, write a program that displays a UI to the user. Each style has little to no overlap.

I'm not sure of where your inferences about haskell/lisp/ecs came from, at no point was I pushing for any of their models. Quite frankly I don't like any of their models and I'd think a UI library like that should just use polymorphic variants. They are simple, extendable, etc..., but not like anything like the style that you asked about, hence why I did not bring them up in the question post at all.

elliottslaughter commented 5 years ago

Please calm down, people. You're posting so much I don't even have time to read what you're writing.

@capr What's wrong with the workaround I posted in #339?

I'm still trying to understand the rest of your issues from the original post.

capr commented 5 years ago

@elliottslaughter the context for that workaround not working is the following: how do you build a bunch of Terra methods that reference each other in their code? So far I know of only two ways to do that:

A) do this in 2 passes: first, build all the methods with no body, then on a second pass, build them again in whichever order you want, calling resetdefinition() over the stubs from the first pass as you go.

B) compile them in one pass, hooking up __methodmissing to compile the dependencies recursively (a form of lazy typechecking if you will). You would still have to respond with a stub on recursive references (since the method that is currently compiling is the one triggering __methodmissing), but that's ok, you will replace that after the compilation with the real body using resetdefinition() (note that in this case you really can't infer the return-type, Terra throws an error here too, that's ok).

With method A you need to sacrifice return-type inference. The workaround in #339 doesn't work with that, since you need the function to be compiled with it's actual body in order to infer the return type but you're building stubs.

With method B you need to know which function definition to pick in __methodmissing in case it's asking for an overloaded function since you can't respond with the overloadedterrafunction object itself (not supported, thows an error), and picking the right definition is not a trivial decision (see tryinsertcasts in terralib.lua).

Is there another way to skin this cat without sacrificing neither return-type inference (method A) nor overloaded functions (method B)?

PS: My original post was also intended to document how Terra could open up its internals more in order to make language extensions easier, something which is listed as a TODO item in one of the papers IIRC.

capr commented 5 years ago

Also note: method B allows a bonus optimization: since it allows you to know which of the methods of a type are never called by other methods of that same type, you know that you don't have to make those methods virtual in that type, because overriding them in a descendant would have no polymorphic effect in that class. This is yet another case for improving the Terra reflection API instead of relying on hacks over __methodmissing. (this is wrong)

elliottslaughter commented 5 years ago

Ok, I think I'm still a little confused.

First some observations:

  1. Recursively defined single functions already don't support return type inference. I.e. this fails: https://gist.github.com/elliottslaughter/35f2bf7fc0183c109fe07546f1dd55ee#file-terra_single_function-t

    But this works: https://gist.github.com/elliottslaughter/35f2bf7fc0183c109fe07546f1dd55ee#file-terra_single_function_explicit_return-t

    Note this also applies to methods. This fails: https://gist.github.com/elliottslaughter/35f2bf7fc0183c109fe07546f1dd55ee#file-terra_single_method-t

    But this works: https://gist.github.com/elliottslaughter/35f2bf7fc0183c109fe07546f1dd55ee#file-terra_single_method_explicit_return-t

  2. Prototypes on methods seem to break. This should probably be filed as a separate bug. (And anyway, even if this worked it wouldn't give you return type inference because prototypes need an explicit return type.)

    This breaks: https://gist.github.com/elliottslaughter/35f2bf7fc0183c109fe07546f1dd55ee#file-terra_single_method_prototype-t

    (Edit: This was actually already filed as #338.)

It seems to be that return type inference is not available in situations where you want mutual recursive functions in general? I'm not sure there's a reasonable way to solve this, the fixpoint of the return type could be undefined even if the meet operator is well defined. (Though maybe I'm missing something here.)

So it doesn't seem super surprising that if you add overloaded functions/methods to the mix that this wouldn't be magically fixed.

  1. This fails: https://gist.github.com/elliottslaughter/35f2bf7fc0183c109fe07546f1dd55ee#file-terra_multi_function-t

    But this works: https://gist.github.com/elliottslaughter/35f2bf7fc0183c109fe07546f1dd55ee#file-terra_multi_function_prototype-t

I'm trying to figure out what you mean by __methodmissing. I guess the thing I'm stuck on is that __methodmissing is a macro. You therefore return quotes out of it; not functions, overloaded functions, methods or what have you. There is some question, I suppose, of whether you can define things in the right order. But at least when I set everything up manually I can get it to work. The following isn't pretty, but it does work.

  1. This works: https://gist.github.com/elliottslaughter/35f2bf7fc0183c109fe07546f1dd55ee#file-terra_method_missing-t

So it seems to me that return type inference may already be a lost cause (if you want mutual recursion, perhaps not otherwise) and that if you set things up in the right order it should be possible to make things work out.

If I'm misunderstanding, maybe you can construct a similar minimal example to the ones above that shows what you're trying to do?

capr commented 5 years ago

Ok, let me clarify the issue with return type inference: when you have recursive references, i.e. f calling f (or f calling g calling f) that's when you can't infer the return type since you need to typecheck the call to f in order to finish compiling f. So I agree with you there, I don't expect it to work in this case.

With that out of the way. let's go back to the original problem: In my language extension I have a bunch of function definitions that reference each other but they're not recursive references, it's just that the definitions can appear in random order so I don't know in which order I should compile them. That's the problem which the two solutions I mentioned (A and B) is trying to solve. Does that make sense?

elliottslaughter commented 5 years ago

Would it be ok to use a solution in which the user is expected to order the methods so that definition precedes use? It seems like if you start from that assumption it should be possible to make something work. Otherwise we're again treading into territory that Terra doesn't have great support for in general---which isn't to say it'll be impossible, but we're probably talking about something like AST introspection to figure out the call graph.

capr commented 5 years ago

A few notes on that:

Terra already handles random definition order as long as there's no Lua code between definitions[1], so I would consider it a downgrade not to have that in an extension language. Also, from a user's pov it's frustrating to know that you have to resort to hacks to achieve something that's already in terralib but you cannot program with it. So for me the best way to solve this would be to expose that functionality through an API that allows one to create multiple interdependent definitions in random order and typecheck them in bulk. No more resetdefinition() or __methodmissing hacks.

[1] a mild source of head-scratching for me as I was learning the language; the old and syntax wasn't great, I get it but this "invisible complation unit" stuff is even more weird. I wonder why was that preferred over an enclosing syntax that could also force the issue that no Lua variables can be defined or altered in between defs.

capr commented 5 years ago

I ended up using solution A (which means I gave up on return-type inference in order to have arbitrary definition order and auto-overloading, and overriding on individual definitions of overloaded functions), WIP code is here https://github.com/luapower/oops

elliottslaughter commented 5 years ago

It may be that there is an intermediate solution between the original version of lazy type checking (where we couldn't even tell you the type of a quote in all cases) and the current eager type checking (which forces forward declarations when you're not defining functions back to back). I probably won't have time to look into it personally in any sort of reasonable time frame. But it's something that someone else could plausibly take a look at.

capr commented 5 years ago

I understand. Thanks for all the help so far.