grin-compiler / grin

GRIN is a compiler back-end for lazy and strict functional languages with whole program optimization support.
https://grin-compiler.github.io/
1.03k stars 38 forks source link

Counting Immutable Beans #82

Open andorp opened 4 years ago

andorp commented 4 years ago

Implementation of the Counting Immutable Beans (CIB) for the GRIN compiler.

Summary

CIB uses instrumentation of the original program. There are four new instructions in the syntax that are inserted via instrumentation. This can be categorized into two;

In the CIB approach every heap location has a reference counter associated with it. Inc increments the counter for the location, and also increments all the referred locations transitively. Dec decrements the counter for the location, and also decrements all the referred locations transitively.

Reset, reuse:

From the CIB paper:

let y = reset x.

If x is a shared value, than y is set to a special pointer value BOX, otherwise to the heap location associated with x. If x is not shared than reset decrements the reference counters of the components of x, and y is set to x.

let z = reuse y in ctor_i w.

If y is BOX reuse allocates a new heap for the constructor. If y is not Box the runtime reuses the heap location for storing the constructor.

Application of the same idea for GRIN:

Differences: meanwhile Lean's IR put every variable on the heap, GRIN uses variables as were registers and only a subset of the registers are associated with heap locations. A register is associated with heap location if its type is Loc. This means the GRIN implementation of the CIB approach needs a type environment which tells which variables can be affected by the CIB operations.

In GRIN:

We need to add 4 new instructions:

The GRIN nodes store primitive values, but the runtime makes the difference between location values and primitive values, thus it is able to create the transitive closure of the reachability relation of a location and manipulate its reference counters.

Every of the four instructions needs to be implemented in the GRIN runtime/interpreter.

In the original paper reuse of the constructors could happen only of the arity of the constructors are the same. But in GRIN as the runtime needs to allocate heaps based on the type of the heap location. This means every heap location can have its own arity, and reuse if the heap location is possible only if the new node does not exceeds the arity of the heap node. Otherwise a new node needs to be allocated, with the maximum arity.

The most important change is the reuse construction. It changes certain instances of the store operation to the reuse operation.

Before:
x <- store y;

After:
z <- reset w;
...
x <- reuse z y;

In this case we need to decide to reuse the heap location associated with w only if w can accommodate all the possible values of x. This means the max-arity(w) >= max-arity(x). Meanwhile Lean's approach uses the arity of the constructors in the alternatives, we can use the abstract information of all the possible runs.

Implementation steps:

  1. Import abstracting the definitional interpreters from the mini-grin repo
  2. Change the implementation to use base functors instead of Expr datatype
  3. Implement reference statistics with the new interpreter, as a warm-up exercise
  4. Implement CIB program instrumentation for GRIN producing ExprF :+: CibF AST
  5. Implement interpreter for CIB extended GRIN program
  6. Extra: Implement LLVM codegen plugin for CIB instructions
bjorn3 commented 4 years ago

Reference counting may have a higher total overhead than garbage collection. I don't know if re-using allocations saves sufficient time to make it's overhead smaller than garbage collection. I think it would be really useful to benchmark CIB against GC once it is implemented.

andorp commented 4 years ago

There are some preliminary results in the Lean paper; https://arxiv.org/pdf/1908.05647.pdf FireShot Capture 001 -  - arxiv org

andorp commented 4 years ago

About the comparing GC with CIB. We can't do that measurement yet. As we have only a simple runtime implemented in C and LLVM which does only have allocation as memory management.

I agree that it would be nice to have something like a GC. What stops us? Mainly that our focus is currently is not on the LLVM and runtime implementation. Why? We are changing the AST to accomodate datalog as possible way of implementing different kind of HeapPointsTo analysis. The current proof of concept implementation can't handle bigger programs than 70k AST nodes.

The project is based around the idea of whole program analysis of GRIN is equivalent to Modern Pointer Analysis problems. Of course we can opt-out from the whole program analysis and run analysis on procedure or module level as other compilers do and generate LLVM code from that level of optimized code. But our personal goal is apply whole program analysis on industrial sized projects.

I think a simple GC supported GRIN runtime implemented in LLVM would be a nice master thesis for someone, who is interested in such a topic. Even another bachelor's or master's could be the implementation of the CIB runtime in LLVM for GRIN.

Please shout if you know anybody that would be interested doing such a thing. :)

Avi-D-coder commented 4 years ago

I have been researching, and slowly building a FP oriented concurrent copying GC (Sundial design document) for Rust, one of the project goals is to enable it's use as the foundation of FP language runtimes, I suspect in the long run it would be a good fit. It requires both direct and transitive set of points to type info, in order to achieve pauseless concurrent collection, While Sundial will support vtable/runtime polymorphism, it will come at a higher cost. My interpretation of the docs folder is that GRIN's points to analysis provides this type info?

On a related note, is update always done with release ordering? Is a more efferent C11 style memory model planned?

savuori commented 2 years ago

On the other hand, reference counting seems to be a prerequisite for in-place mutation optimization that Koka (FBIP) and Roc use. That could be really interesting optimization as well.

andorp commented 2 years ago

I discontinued this development for the sole reason of not having enough time and resources in my life now, and it would require more than I can allocate to it. I am more than happy to discuss my learnings with anyone who would like to pick this up.

GunpowderGuy commented 1 month ago

I have been researching, and slowly building a FP oriented concurrent copying GC (Sundial design document) for Rust, one of the project goals is to enable it's use as the foundation of FP language runtimes, I suspect in the long run it would be a good fit. It requires both direct and transitive set of points to type info, in order to achieve pauseless concurrent collection, While Sundial will support vtable/runtime polymorphism, it will come at a higher cost. My interpretation of the docs folder is that GRIN's points to analysis provides this type info?

On a related note, is update always done with release ordering? Is a more efferent C11 style memory model planned?

Would Grin exploit your GC by compiling to rust code that uses its api?

Avi-D-coder commented 4 weeks ago

@GunpowderGuy that was not what I was thinking, but sadly the research was never finished so nothing can use it.