Closed thomasrolinger closed 2 years ago
Here is how I plan to approach this:
call
and parentCall
. Don't worry about any other checks. A[B[expr]]
; we still require that we can get the call base for call
and parentCall
. We'll need to comment out the check in analyzeCandidateIE()
that looks at call
's arguments and aborts if any of them are CallExpr
s. The checks we do in isLoopDomainValidIE()
need to be updated, specifically the function callHasSymArguments()
needs to be modified to be less restrictive.expr
and ensures that it only involves i
and constants, and only does arithmetic operations (i.e., no function calls). This can be done during normalization. If it passed the check, nothing else needs to be changed.A[expr(B[expr])]
, which is to say we can have things like A[B[expr] op val]
. In this case, val
can be a constant or the loop index variable, and op
is a binary arithmetic operator. The hardest part here will be the logic to determine we have B
in there somewhere. It may not be too bad, since what we are looking for is a B
whose parent is a CallExpr
that does some arithmetic operation, and its parent is a CallExpr
that is A
. This can be generalized so we just keep looking up until we hit a CallExpr
that we can get the call base for. But is there something we can check for to easily indicate that we don't have something valid? I suppose if we hit anything that isn't a CallExpr
, we don't have a candidate.expr
if they are yielded by some loop that iterates over an array or domain defined outside of the forall
. During normalization, we can determine whether they come from "things" outside of the forall
, but we'll have to wait until prefolding to know that they come from arrays/domains. To this end, we need to add the potential arrays/domains to the primitive before prefolding, and then track modifications to them in post resolution. They will be in the same category as B
, in that we cannot have any modifications to them directly outside of the forall
.I created a new branch to implement this, as it will probably take a good chunk of new code: non-affine-analysis
I have accomplished (1) from the above post. We stick call
and parentCall
into the primitive as Expr
s, and replace parentCall
with the primitive. This seems to be working as intended for our real applications, as well as small test cases. We mostly just need call
during prefolding to pass directly into inspectAccess
and executeAccess
; this is much easier than what we did before, which was to literally build up a new CallExpr
for B[i]
. But when we need to revert the optimization, we simply just return parentCall
from our prefolding function, which also accomplishes what we did before but in less code (i.e., we don't need to build up A[B[i]]
manually).
We also still need the call bases A
and B
added to the primitive for the purposes of some static checks in prefolding and later in post resolution. As we move onward, we'll need to do a bit more heavy lifting to extract B
when the expression is more complicated (specifically case (5) above will require this).
The commit for doing (1) is here: https://github.com/thomasrolinger/chapel/commit/1d463761e4451f8fdf2ab25b56e8bcb57c53ba4f
Addressing (2):
It was a bit more involved than originally thought. The function callHasSymArguments()
that is called from isLoopDomainValidIE()
is very restrictive, in that it ensures that the B[expr]
we pass in is strictly of the form B[i]
where i
is yielded by the loop. To support B[expr]
, we made a copy of this function and renamed it as callHasSymArgumentsIE()
. This version allows for more complex expressions, specifically anything that involves the loop index and immediates. If one of the arguments to B[expr]
is a CallExpr
, it calls the function recursively to look at the arguments to that call.
All tests, small and real-world, seem to work as intended (though we have no introduced new tests yet that specifically introduce cases where we should be invalid). Here is the commit: https://github.com/thomasrolinger/chapel/commit/2b4111400a06ac58fd256d6a2b316f94c8a037c2
Addressing (3):
We simply added a new function, isCallExprBinaryOperator()
, that checks whether a given CallExpr
is a known binary operator. We limit ourselves to + - * / % ** & | ^ << >> && || == != <= >= < >
. If we detect such an operator, then we're OK. Otherwise, we do not support it and will not do the optimization.
This works correctly on the test cases, and I ensured it does catch the case when there is a function call. Here is the commit: https://github.com/thomasrolinger/chapel/commit/afef79ffb28027398533fe63755a1ba97bcf18df
Addressing (4):
When we are looking at a CallExpr
in findCandidatesIE()
, we first check that its parent is a CallExpr
, and then that the parent has only 1 actual and we can get a call-base for both the parent and the call. That allows us to find things like A[B[expr]]
, but not A[B[expr] + 1]
.
The full AST for A[B[i]+1]
is as follows:
(185668 CallExpr
(414305 SymExpr 'unknown A[185585]:_unknown[54]')
(185669 CallExpr
(185670 UnresolvedSymExpr '+')
(185672 CallExpr
(414306 SymExpr 'unknown B[185595]:_unknown[54]')
(414307 SymExpr 'unknown i[185655]:_unknown[54]'))
(185674 SymExpr 1 'val _literal_33[4974]:int(64)[13]'))))))
First, the check for numActuals() == 1
will prevent us from moving forward since we have 2 actuals in this case. Second, assuming that call
is B[i]
, then parentCall
will be the +
call; we cannot get a call base for that.
What we need to do is ensure that we do have B[expr]
as call
(i.e., we can get its call base). If we have that, then we look up, always ensuring we have a CallExpr
that is a valid binary operator (via isCallExprBinaryOperator()
), and stop once we find a CallExpr
that we can get the call base for. We stop our search if we hit a CallExpr
that is not a valid binary operator (like an assignment or +=
). In that case, we don't have a valid candidate.
Using the above approach, we should end up with call
set to B[i]
and parentCall
set to A
. We then pass in both call
and parentCall
to isLoopDomainValidIE()
and do an additional call to callHasSymArgumentsIE()
for parentCall
. We need to also pass in call
to this so we can ignore it when we come across it as a CallExpr
(it would be considered invalid w.r.t. our checks).
With the above implemented, it seems we can correctly support A[B[expr] op expr]
and all tests pass: https://github.com/thomasrolinger/chapel/commit/99ed488f779dd5737782feb80de2329bcf2ad583
Addressing (5):
We want to build up a vector of Symbol
s to use in callHasSymArgumentsIE()
when checking for non-immediates and non-CallExpr
s in our expressions. These symbols will be from any loops "above" the A[B[i]]
access. For example, consider:
for i in A {
forall j in B {
for k in C {
...A[B[i]]...
}
}
}
In this case, we want to add i
, j
and k
to the vector of symbols. For now, this only considers loops that are "local" to the A[B[i]]
access (i.e., we do not go across function calls).
So in the analyzeCandidateIE()
function, we add a call to a new function, extractAllLoopSyms()
, that will start at call
and find any for
or forall
loop that is above it. For each loop we find, we extract out the loop index symbols and add them to a vector. But we only do this if the particular loop index variable is yielded by a "valid" iterator. This means it must be something that could potentially be an array or domain (i.e., not a range like j in 0..#3
). We also add the loop iterator symbol and its type to a vector, as we'll want that info during prefolding to ensure the variable is yielded by an array or domain. By "type", I mean whether it is from a stand-alone array/domain, a dot-domain, or something like foo.field
. We do the same kind of checks for another symbol already.
One thing that we had to adjust is that the original intent of callHasSymArguments()
was to set the relevant index for the symbol used in the access. Since we allow for more complex accesses, this no longer is a thing. The issue is that the index is used later on in isLoopDomainValidIE()
to check out things regarding the loop iterator. For now, we just assume that we only have 1 iterator, so the index is always 0.
To make things easier during prefolding (and more flexible), we will only pass on loop iterator symbols to prefolding if the corresponding loop index variable was actually used in the expression(s) of interest. There's no sense in checking/requiring things about something that is not relevant for the irregular access. To do this, we build up a std::map
of the loop iterator symbols that correspond to loop index symbols actually used. That is the only "thing" we will carry with us forward when creating the primitive.
Also, I changed the name callHasSymArgumentsIE()
to callHasValidExpression()
.
The next step is to pass in all of the relevant loop iterator symbols into the MAYBE_IRREG_ACCESS
primitive. When we get to prefolding, we'll ensure those symbols are either arrays or domains. If one of them isn't, we abort the optimization.
Following that, we'll want to hand off those symbols to post resolution and track changes to them. They can be written to anywhere outside of the outer loop (i.e., same conditions we enforce for B
and its domain).
Here is the commit for all of the changes mentioned above: https://github.com/thomasrolinger/chapel/commit/ddb6a39e467d3e31c1a0800d9a3b264ebc335506
The final thing we can do is leverage the analyzeCandiateIE()
approach for analyzeCandidateIWA()
. This is actually very easy because we have less restrictions in place. We can have function calls/etc., since we don't have to worry about the inspector. In the end, we just requiring call
to have a call-base, and that to be nested in a CallExpr
that also has a call-base.
Here is the commit that does that: https://github.com/thomasrolinger/chapel/commit/46d992db767a8cfc8fbd1f5322d7e8f15091c0f7
And we've merged the non-affine-analysis
branch into main: https://github.com/thomasrolinger/chapel/commit/5f7f83ab64f12694443745b4dede04dacca1d5a9
For the inspector-executor optimization, we require that we have something like
A[B[i]]
. However, we could just as well handle something likeA[B[i+1]]
orA[B[i]+1]
.The original purpose for restricting ourselves to
A[B[i]]
is because we want to be able to reason about how the accesses intoB
could change. To this end,i
must be the loop index variable and it must be yielded by an array or domain. But doing an arithmetic operation oni
with either a constant or some other variable that we can reason about would be fine too.Another reason why we just looked at
A[B[i]]
is because when we create the compiler primitive for the access, we just includeA
,B
andi
. During prefolding, we rebuild theA[B[i]]
access as needed. I wasn't aware that we could add entire expressions to the primitive, assuming we remove them from the AST first.So, we can allow for basic affine expressions in the irregular access as follows:
A[B[i]]
into Symbols for the primitive, do what we do for aggregation: addcall
andparentCall
as their original expressions.i
is the loop index variable and yielded by an array/domain. But allow for any arithmetic operations withi
and constants. Also allow forB[i]
to be involved any arithmetic operation with constants.forall
).To this end, we can support accesses of the form
A[expr(B[expr])]
whereexpr
is limited to arithmetic operations involving only constants and loop variables derived from arrays/domains that we can track.We can't allow just any variables to be used in the expression, even something like
const k = C[idx]
. While we can reason thatk
won't change after it is declared, we would have to do further reasoning of whenidx
changes. This quickly gets very nasty and recursive. So we just won't allow variables that are not yielded by a loop over an array or domain. So we'd be able to handle something like:I don't see this as too difficult to do, and most all of it can be done during normalization. We will need to do some checks in prefolding to ensure that all of the variables in the expressions are derived from arrays/domains. Since we can have an arbitrary number of variables in the expressions, and we need to add all of the arrays/domains that they are taken from to the primitive, we'll need to beef up the primitive some. We can do this with an int we add to specify how many such arrays/domains we added.