typelead / eta

The Eta Programming Language, a dialect of Haskell on the JVM
https://eta-lang.org
BSD 3-Clause "New" or "Revised" License
2.61k stars 145 forks source link

Aggressively clear local references #79

Open rahulmutt opened 8 years ago

rahulmutt commented 8 years ago

Tips from @ashishnegi about managing lazy garbage:

ashishnegi [6:41 PM]
hi.. i have heard that being a lazy language and on jvm, we add a lot of garbage.. for faster garbage collection of intermediate collections we give some hints to GC in bytecode.. Can i get some reference to how it is done ? any help appreciated..

alexmiller [6:43 PM]
@ashishnegi: the compiler aggressively clears local references to make more objects available for GC.. That is, creates bytecode that clears references - a lot of that is done via the Util.ret calls..

Relevant links: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L5046 https://groups.google.com/forum/#!searchin/clojure/local$20clearing|sort:relevance/clojure/FLrtjyYJdRU/itFqPRk0BrcJ http://stackoverflow.com/questions/15994316/clojure-head-retention

This issue should be tackled after #23 since that ticket will make lots of nontrivial changes to the generated bytecode.

rahulmutt commented 7 years ago

Since an initial implementation of #23 has been done on the context-switch branch and the basic consequences have been seen, I have a much better understanding of this problem. In Eta, because of extensive inlining, the methods generated can be fairly large since the IO monad's bind is inlined when -O2 is turned on - this is part of the reason that Eta will probably have the best performing monads of the existing JVM languages [1].

Moreover, Eta's bytecode has a concept of "checkpoints". These checkpoints happen whenever a lazy value is evaluated through a case and it used to take care of Eta exceptions to unwind the stack. When working on #23, it was found that some methods are so extensively inlined, there can be almost 80 checkpoints in a single method! [2] Meaning there will be some methods which will run for a long time and for such methods, we must do aggressive clearing or there will be severe memory leaks.

The basic outline for how we go about this:

  1. When generating code for a checkpoint, we analyze the expressions that follow each of the branches and find all those bindings that have been stored in local variables that will not be used later on in the program in any of the branches. These will be cleared just before evaluation happens.
  2. When generating code for the branches that follow after evaluation, we analyze each branch and find out which variables are not needed at all in this branch and can be safely cleared.
  3. For closures whose entry code is linear (i.e. no control flow breaks), they will typically end with a "tail call" which may go on for a while. All the local variables in a method should be cleared before making this call. This case is not handled by (1) because no direct evaluation is happening - an application is happening.
  4. The argument stack may carry stale references. Say you add context.R(5, arg). If you never modify context.R(5) later on in the program, arg will have a strong reference, and hence will not be GC'd! Thus, there must be regular checkpoints (say when a new stack frame is pushed) where the argument stack is cleared of references.

Let's take a look at an example to make this more concrete [3]:

contrived :: Int -> Int -> Int -> Int
contrived x y z = case x > 1 of
  True -> case y > 2 of
    True -> y + z
    False -> y
  False -> sum [z, 1, 2]

This entire logic also goes hand-in-hand in making #23 more feasible. It was found that the naive implementation for context switching causes a code explosion and in some cases can cause methods to just barely touch the 65,535 JVM bytecode size limit for a single method. So by only saving the references that absolutely need to be saved, we again free up references more eagerly for GC and make the generated bytecode minimal.

[1] Once we implement more optimisations, of course! [2] In fact, those methods had some pathological case that caused codec-jvm to either generate bad code or go into an infinite loop. [3] For simplicity, we ignore optimizations like the worker-wrapper transformation which will reduce the number of case evaluations that will happen. [4] You may think here that we may just as well re-use either {0} or {4} and you'd be right. This is another optimization which will significantly reduce the amount of space a given Eta stack frame will take up. This will also reduce the amount of data that is saved on a context switch.