Aggressively clear local references

Since an initial implementation of #23 has been done on the context-switch branch and the basic consequences have been seen, I have a much better understanding of this problem. In Eta, because of extensive inlining, the methods generated can be fairly large since the IO monad's bind is inlined when -O2 is turned on - this is part of the reason that Eta will probably have the best performing monads of the existing JVM languages [1].

Moreover, Eta's bytecode has a concept of "checkpoints". These checkpoints happen whenever a lazy value is evaluated through a case and it used to take care of Eta exceptions to unwind the stack. When working on #23, it was found that some methods are so extensively inlined, there can be almost 80 checkpoints in a single method! [2] Meaning there will be some methods which will run for a long time and for such methods, we must do aggressive clearing or there will be severe memory leaks.

The basic outline for how we go about this:

When generating code for a checkpoint, we analyze the expressions that follow each of the branches and find all those bindings that have been stored in local variables that will not be used later on in the program in any of the branches. These will be cleared just before evaluation happens.
When generating code for the branches that follow after evaluation, we analyze each branch and find out which variables are not needed at all in this branch and can be safely cleared.
For closures whose entry code is linear (i.e. no control flow breaks), they will typically end with a "tail call" which may go on for a while. All the local variables in a method should be cleared before making this call. This case is not handled by (1) because no direct evaluation is happening - an application is happening.
The argument stack may carry stale references. Say you add context.R(5, arg). If you never modify context.R(5) later on in the program, arg will have a strong reference, and hence will not be GC'd! Thus, there must be regular checkpoints (say when a new stack frame is pushed) where the argument stack is cleared of references.

Let's take a look at an example to make this more concrete [3]:

contrived :: Int -> Int -> Int -> Int
contrived x y z = case x > 1 of
  True -> case y > 2 of
    True -> y + z
    False -> y
  False -> sum [z, 1, 2]

Bindings x {0}, y {1}, and z {2} will be stored in local variables.
Evaluation of x > 1 happens and the result is stored in another local variable {4}.
Applying (1) above, x is not used in any of the further branches so we can clear {0}. Moreover, the result of x > 1 is not used after the pattern match, so {4} can be cleared.
Branch Split
- True
- Evaluation of y > 1 happens and the result is stored in another local variable {5} [4].
- Applying (1) above, we can clear out {5} since it's not used.
- Branch Split:
  - True:
  - Applying (3), a tail call to (+) is made, so we should clear out {1} and {2}.
  - False:
  - Applying (3), a y is evaluated (equivalent to tail-call) so {1} and {2} should be cleared.
- False
- Applying (2), y is not used in this branch, so we can clear {1}.
- Applying (3), a tail call to sum is made, hence we should clear out {1} and {2} just before the call is made.
Note that x is not used in any of the branches after

This entire logic also goes hand-in-hand in making #23 more feasible. It was found that the naive implementation for context switching causes a code explosion and in some cases can cause methods to just barely touch the 65,535 JVM bytecode size limit for a single method. So by only saving the references that absolutely need to be saved, we again free up references more eagerly for GC and make the generated bytecode minimal.

[1] Once we implement more optimisations, of course! [2] In fact, those methods had some pathological case that caused codec-jvm to either generate bad code or go into an infinite loop. [3] For simplicity, we ignore optimizations like the worker-wrapper transformation which will reduce the number of case evaluations that will happen. [4] You may think here that we may just as well re-use either {0} or {4} and you'd be right. This is another optimization which will significantly reduce the amount of space a given Eta stack frame will take up. This will also reduce the amount of data that is saved on a context switch.

typelead / eta

Aggressively clear local references #79