reactorlabs / sourir

6 stars 5 forks source link

Optimizing vector.sou #157

Open mhyee opened 7 years ago

mhyee commented 7 years ago

From https://github.com/reactorlabs/sourir/pull/131#issuecomment-306560111

Looking at the result of inlining vector.sou reveals plenty of opportunities for making the optimizations a bit more aggressive, in particular constant folding. Consider for example (this is the output of --opt inline,cont_fold):

 array a2[0]
 [...]
 var args_11_1 = a2
  [...]
 branch (args_11_1 == nil) $error_1_22_1 $checklen_1

This branch should be eliminated. Later:

 arrlen_1 <- length(args_11_1)
 branch (arrlen_1 < 0) $error_2_22_1 $loop_1a_1

This test can also be eliminated.

This is precisely the sort of things that Jan mentioned early during the project: the code that gets produced by speculation/traces for a dynamic language tends to be very ugly, and simple optimizations can do a lot. I think it would be nice to do something here.

Also https://github.com/reactorlabs/sourir/pull/131#issuecomment-306561430

The same file also has (after inlining and constant-folding):

branch true $error_1_21_1 $checklen

which suggests the cleanup passes should also be strengthened. (Note that what we can do here is strengthened by the enforced label invariants, we know that $checklen is dead code and that $error_1_21_1 can be just moved there.)

There are probably other cases we can consider.

gasche commented 7 years ago

So I played a bit with better array shape approximations, but my code is a mess right now (it's just a prototype) and it's rather about investigating what can easily be done.

I discover patterns of an interesting form that we cannot currently optimize (the comments come from inlined code, but they are useful to make a sense of what is happening):

 # This is an internal helper, only used by `obj_add` and `obj_find`.
 # Declare variables.
 var map = nil
 # Check if `obj` is nil.
 map <- nil
 goto begin
begin:
 # If the map entry is empty, we're done.
 branch (map == nil) $error_2 $loop
$loop:
 # Check if the current entry is the right one.
 var candidate = map[0]
 branch (candidate == 1111) $done_1 $next
$next:
 # We did not find the right entry, so try the next one.
 # Get the next entry
 drop candidate
 map <- map[2]
 goto begin
$done_1:
 # We found the right entry, so return its value.
 res <- map
 [...]
$error_2:
 res <- nil
 [...]

This codes loops over a linked list map, stopping when it becomes nil. But the linked list is already nil at the beginning. So in theory the whole loop can be removed.

In practice that does not happen because the loop-exit condition map == nil is within the loop (after the loop label begin), and the body of the loop assigns the variable map (to the tail of the linked list) when it runs. Constant folding will realize that in the else-branch of the test, it cannot predict the value of map, and when merging back at label it will thus mark map as unpredictable -- and not optimize the test.

If we wanted this optimization to happen, we would need a constant folding flow analysis where the current approximation environment influences the control-flow traversal (and not just the traversing values): if we have a branch e $l1 $l2 when e approximates to true, then we should continue computing the fixpoint in $l1 but not in $l2 (so if it was not traversed yet it will remain counted as dead code).

gasche commented 7 years ago

(Right now the resulting code has one small defect: it does not quite have the same semantics as the input code. But that's to be fixed and, besides, the point is that it's branch-free, right?)

(The reason why the code is broken is that array approximations don't currently take aliasing into account. But that's not too hard to fix, by approximating the heap, I suppose.)

gasche commented 7 years ago

So fixing the aliasing handling reveals that if you want to optimize well, you need constant-folding-directed loop unrolling (and not just constant-folding-directed branch taking). Consider a linked list with a single cell:

var map = [1111, nil]

And a loop that want to find the key 1113 in it or know that it does not exist:

begin:
  branch (map == nil) $key_does_not_exist $try_cell  # loop branch
$try_cell:
  var key = map[0]
  branch (key == 1113) $key_exists $try_next   # key branch
$try_next:
  map <- map[1]
  goto begin

When constant folding over this with the known value of map above, it encounters the loop branch and predicts false, then it also correctly predicts that the key branch is false, and it goes to $try_next and correctly updates the static information to map = nil. Perfect. But then it goes to begin again and notices that the branch is now true (in the general case it may not know, the point is that false is not a valid prediction anymore), so the merge of the two flows of control pessimizes the static information and forbids to remove the branches and the loop: the static information at begin when you reach fixpoint is "I don't know what map is".

If we wanted to get branch-free code out of this (like you would collect in a tracelet), we would have to duplicate begin in several copies based on static information: the second time we jump to begin, we could instead jump to a copy of it and continue analyzing there. This corresponds to on-demand unrolling, where the demand comes from the optimizer -- in effect a symbolic evaluator. (In general this may not terminate, for example for for (int i = 0; true; i++) it could iterate infinitely, but we could have cutoff bound per-label.)

o- commented 7 years ago

would it make sense to use assumptions to speculate on the things we cannot analyze, like vector length and such?

gasche commented 7 years ago

Yes!

Insertion of assumptions: right now I can't think of an automatic assumption-insertion strategy besides the kind of branch pruning you implemented (branches are the natural thing to speculate on). Right now the way I think of OSRs in this context is more as story-telling: we write an example, point out that if we had a profiling apparatus it would reveal that this assumption is relevant, we manually insert the assumption (note: we could have code in place to make this easier, to "script" such examples), and then we optimize the resulting code (the new version).

Use of existing assumptions: right now our optimizations don't really take advantage of existing assumptions (in the "new version" in the paragraph above). The constant folder could recognize variable-equal-value or array-length assumptions to enrich its static environment, but we have not implemented that yet. You are very right that this would be an important thing to experiment with.

I propose to try to cleanup the part I have already done (not the magic unrolling of the last message), see if we can merge some of those in the master branch, and then try to make use of assumptions as you suggest.