vnmakarov / mir

A lightweight JIT compiler based on MIR (Medium Internal Representation) and C11 JIT compiler and interpreter based on MIR
MIT License
2.29k stars 145 forks source link

Lambdas and Coroutines in C2M #228

Open darko20 opened 3 years ago

darko20 commented 3 years ago

God forbid that Mir C be turned into a C++ compiler, but two C++ features are incredibly useful: coroutines and lambdas (also see blocks in clang). I mention these together because I assume they have similar implementation issues with locals and arguments.

What are the chances of supporting these?

vnmakarov commented 2 years ago

Lambdas can be implemented in MIR. The only important thing is not implemented tail call optimization in MIR-generator. I am going to implement this some time in the future. Access to closure data can be implemented in MIR as it could be done in assembler.

Coroutines require to design and implement some extensions on MIR-level. I don't know whether I implement it or not. I need a convincing case to do this.

But as I understand that you are also proposing to implement some extensions to C supported by MIR2C compiler. It is even harder for me to say right now will I have time to implement it in the future. And there is also an open question what extension should be used for lambdas: GCC nested functions or C++ lambda syntax. Glibc uses nested functions a lot but the developers want to remove them although it is very hard. I see a tendency that people are considering now a past decision to use GCC nested functions (in glibc and some other widely used libraries) as a mistake.

darko20 commented 2 years ago

Probably the best argument for coroutines, assuming a 'stackless' C++20 style implementation is that you get simple, low or zero overhead concurrency which is self-contained and doesn't require host support. This fits in with the light, quick and portable architecture of MIR. They are also life savers when you have mismatched 'push' and 'pull' style code and you avoid having to rewrite deeply nested loops into incremental functions.

I don't think you should spend time on the C compiler. I think c2mir.c should be refactored so that there is an independent interface to the context checking and code generation phases. That way people can write their own language front-ends or generate code programmatically without having to generate strings of C code, if they don't want to write low level MIR code. Updates to the code generator would then be the most welcome contribution along with any new features such as these.

I will give that refactoring a go. I think a procedural interface which allows you to use your own AST so you don't have mirror it to generate code is optional but I'm not sure how complicated that will be.

eliphatfs commented 2 years ago

In my opinion coroutines and lambdas are higher than the MIR level. In languages like scala, C#, and perhaps many more of them, they are transformed into functions and structs in very high level IR (AST with bindings resolved). Lambda is lifted into context capturing classes and member functions while coroutines are transformed into finite state machines. I think that MIR is like machine independent but instruction-level stuff, and thus is not the best place to handle these.

darko20 commented 2 years ago

Some languages have models that can accomodate these sorts of feature approximations (Scala, for instance, isn't exactly a systems programming language), but there is a price to pay in performance, code size and implementation complexity. Coroutines in particular are fairly pointless without low level support, if you want to avoid messy hacks.

Blealtan commented 2 years ago

FYI C++ lambda expressions are just closure objects and require no low-level support at no cost; in contrast, coroutines in C++20 require only several intrinsic functions that allow control flow to suspend and resume. As an example, LLVM provides a set of coroutine intrinsics.

But it is possible to implement coroutines without such intrinsics just like what Scala/C#/etc. It basically requires some lowering procedure that transforms a coroutine into several normal functions, which together form a state machine. While LLVM provides the above intrinsics in its IR, it indeed internally lowers such coroutine to three normal functions, as is mentioned in the above documentation. The design of stackless coroutine intentionally allows such transformation, to gain better performance by avoiding stack manipulation. It is the stackful coroutine that requires messy hacks to implement without low-level support.

It thus leaves a design decision that whether the transformation should be done in MIR compilation or the language frontends. If the former, then design and add some intrinsics as a MIR extension; otherwise, just do the lowering before generating your MIR code in, e.g. C2MIR (or CXX2MIR up to this point?). As you have mentioned, code size and implementation complexity (of language frontend) are the main differences. Performance is hardly impacted, as the coroutine-to-normal code transformation has to be done earlier or later.