Open mhyee opened 7 years ago
So I played a bit with better array shape approximations, but my code is a mess right now (it's just a prototype) and it's rather about investigating what can easily be done.
I discover patterns of an interesting form that we cannot currently optimize (the comments come from inlined code, but they are useful to make a sense of what is happening):
# This is an internal helper, only used by `obj_add` and `obj_find`.
# Declare variables.
var map = nil
# Check if `obj` is nil.
map <- nil
goto begin
begin:
# If the map entry is empty, we're done.
branch (map == nil) $error_2 $loop
$loop:
# Check if the current entry is the right one.
var candidate = map[0]
branch (candidate == 1111) $done_1 $next
$next:
# We did not find the right entry, so try the next one.
# Get the next entry
drop candidate
map <- map[2]
goto begin
$done_1:
# We found the right entry, so return its value.
res <- map
[...]
$error_2:
res <- nil
[...]
This codes loops over a linked list map
, stopping when it becomes nil
. But the linked list is already nil
at the beginning. So in theory the whole loop can be removed.
In practice that does not happen because the loop-exit condition map == nil
is within the loop (after the loop label begin
), and the body of the loop assigns the variable map
(to the tail of the linked list) when it runs. Constant folding will realize that in the else-branch of the test, it cannot predict the value of map
, and when merging back at label
it will thus mark map
as unpredictable -- and not optimize the test.
If we wanted this optimization to happen, we would need a constant folding flow analysis where the current approximation environment influences the control-flow traversal (and not just the traversing values): if we have a branch e $l1 $l2
when e
approximates to true
, then we should continue computing the fixpoint in $l1
but not in $l2
(so if it was not traversed yet it will remain counted as dead code).
(Right now the resulting code has one small defect: it does not quite have the same semantics as the input code. But that's to be fixed and, besides, the point is that it's branch-free, right?)
(The reason why the code is broken is that array approximations don't currently take aliasing into account. But that's not too hard to fix, by approximating the heap, I suppose.)
So fixing the aliasing handling reveals that if you want to optimize well, you need constant-folding-directed loop unrolling (and not just constant-folding-directed branch taking). Consider a linked list with a single cell:
var map = [1111, nil]
And a loop that want to find the key 1113
in it or know that it does not exist:
begin:
branch (map == nil) $key_does_not_exist $try_cell # loop branch
$try_cell:
var key = map[0]
branch (key == 1113) $key_exists $try_next # key branch
$try_next:
map <- map[1]
goto begin
When constant folding over this with the known value of map
above, it encounters the loop branch and predicts false
, then it also correctly predicts that the key branch is false
, and it goes to $try_next
and correctly updates the static information to map = nil
. Perfect. But then it goes to begin
again and notices that the branch is now true
(in the general case it may not know, the point is that false
is not a valid prediction anymore), so the merge of the two flows of control pessimizes the static information and forbids to remove the branches and the loop: the static information at begin
when you reach fixpoint is "I don't know what map
is".
If we wanted to get branch-free code out of this (like you would collect in a tracelet), we would have to duplicate begin
in several copies based on static information: the second time we jump to begin
, we could instead jump to a copy of it and continue analyzing there. This corresponds to on-demand unrolling, where the demand comes from the optimizer -- in effect a symbolic evaluator. (In general this may not terminate, for example for for (int i = 0; true; i++)
it could iterate infinitely, but we could have cutoff bound per-label.)
would it make sense to use assumptions to speculate on the things we cannot analyze, like vector length and such?
Yes!
Insertion of assumptions: right now I can't think of an automatic assumption-insertion strategy besides the kind of branch pruning you implemented (branches are the natural thing to speculate on). Right now the way I think of OSRs in this context is more as story-telling: we write an example, point out that if we had a profiling apparatus it would reveal that this assumption is relevant, we manually insert the assumption (note: we could have code in place to make this easier, to "script" such examples), and then we optimize the resulting code (the new version).
Use of existing assumptions: right now our optimizations don't really take advantage of existing assumptions (in the "new version" in the paragraph above). The constant folder could recognize variable-equal-value or array-length assumptions to enrich its static environment, but we have not implemented that yet. You are very right that this would be an important thing to experiment with.
I propose to try to cleanup the part I have already done (not the magic unrolling of the last message), see if we can merge some of those in the master branch, and then try to make use of assumptions as you suggest.
From https://github.com/reactorlabs/sourir/pull/131#issuecomment-306560111
Also https://github.com/reactorlabs/sourir/pull/131#issuecomment-306561430
There are probably other cases we can consider.