Allow for basic affine expression in irregular access

For the inspector-executor optimization, we require that we have something like A[B[i]]. However, we could just as well handle something like A[B[i+1]] or A[B[i]+1].

The original purpose for restricting ourselves to A[B[i]] is because we want to be able to reason about how the accesses into B could change. To this end, i must be the loop index variable and it must be yielded by an array or domain. But doing an arithmetic operation on i with either a constant or some other variable that we can reason about would be fine too.

Another reason why we just looked at A[B[i]] is because when we create the compiler primitive for the access, we just include A, B and i. During prefolding, we rebuild the A[B[i]] access as needed. I wasn't aware that we could add entire expressions to the primitive, assuming we remove them from the AST first.

So, we can allow for basic affine expressions in the irregular access as follows:

Instead of breaking down A[B[i]] into Symbols for the primitive, do what we do for aggregation: add call and parentCall as their original expressions.
Still ensure that i is the loop index variable and yielded by an array/domain. But allow for any arithmetic operations with i and constants. Also allow for B[i] to be involved any arithmetic operation with constants.
Extend the above bullet by allowing the constants to be variables that are yielded from a loop that iterates over an array or domain that we can track (the array/domain must be defined outside of the forall).
Things that we do not support are function calls within the expressions, as we can't reason about what they do.

To this end, we can support accesses of the form A[expr(B[expr])] where expr is limited to arithmetic operations involving only constants and loop variables derived from arrays/domains that we can track.

We can't allow just any variables to be used in the expression, even something like const k = C[idx]. While we can reason that k won't change after it is declared, we would have to do further reasoning of when idx changes. This quickly gets very nasty and recursive. So we just won't allow variables that are not yielded by a loop over an array or domain. So we'd be able to handle something like:

forall i in C.domain {
   for j in D.domain {
      for j in Y {
         res[i] += A[B[i+j*1] + k];
      }
   }
}

I don't see this as too difficult to do, and most all of it can be done during normalization. We will need to do some checks in prefolding to ensure that all of the variables in the expressions are derived from arrays/domains. Since we can have an arbitrary number of variables in the expressions, and we need to add all of the arrays/domains that they are taken from to the primitive, we'll need to beef up the primitive some. We can do this with an int we add to specify how many such arrays/domains we added.

Here is how I plan to approach this:

Modify the primitive so we pass in call and parentCall. Don't worry about any other checks.
Allow for something like A[B[expr]]; we still require that we can get the call base for call and parentCall. We'll need to comment out the check in analyzeCandidateIE() that looks at call's arguments and aborts if any of them are CallExprs. The checks we do in isLoopDomainValidIE() need to be updated, specifically the function callHasSymArguments() needs to be modified to be less restrictive.
Building off (2) , insert checks that looks more closely at expr and ensures that it only involves i and constants, and only does arithmetic operations (i.e., no function calls). This can be done during normalization. If it passed the check, nothing else needs to be changed.
Allow for A[expr(B[expr])], which is to say we can have things like A[B[expr] op val]. In this case, val can be a constant or the loop index variable, and op is a binary arithmetic operator. The hardest part here will be the logic to determine we have B in there somewhere. It may not be too bad, since what we are looking for is a B whose parent is a CallExpr that does some arithmetic operation, and its parent is a CallExpr that is A. This can be generalized so we just keep looking up until we hit a CallExpr that we can get the call base for. But is there something we can check for to easily indicate that we don't have something valid? I suppose if we hit anything that isn't a CallExpr, we don't have a candidate.
Finally, allow for variables to be used in expr if they are yielded by some loop that iterates over an array or domain defined outside of the forall. During normalization, we can determine whether they come from "things" outside of the forall, but we'll have to wait until prefolding to know that they come from arrays/domains. To this end, we need to add the potential arrays/domains to the primitive before prefolding, and then track modifications to them in post resolution. They will be in the same category as B, in that we cannot have any modifications to them directly outside of the forall.

I created a new branch to implement this, as it will probably take a good chunk of new code: non-affine-analysis

I have accomplished (1) from the above post. We stick call and parentCall into the primitive as Exprs, and replace parentCall with the primitive. This seems to be working as intended for our real applications, as well as small test cases. We mostly just need call during prefolding to pass directly into inspectAccess and executeAccess; this is much easier than what we did before, which was to literally build up a new CallExpr for B[i]. But when we need to revert the optimization, we simply just return parentCall from our prefolding function, which also accomplishes what we did before but in less code (i.e., we don't need to build up A[B[i]] manually).

We also still need the call bases A and B added to the primitive for the purposes of some static checks in prefolding and later in post resolution. As we move onward, we'll need to do a bit more heavy lifting to extract B when the expression is more complicated (specifically case (5) above will require this).

The commit for doing (1) is here: https://github.com/thomasrolinger/chapel/commit/1d463761e4451f8fdf2ab25b56e8bcb57c53ba4f

Addressing (2):

It was a bit more involved than originally thought. The function callHasSymArguments() that is called from isLoopDomainValidIE() is very restrictive, in that it ensures that the B[expr] we pass in is strictly of the form B[i] where i is yielded by the loop. To support B[expr], we made a copy of this function and renamed it as callHasSymArgumentsIE(). This version allows for more complex expressions, specifically anything that involves the loop index and immediates. If one of the arguments to B[expr] is a CallExpr, it calls the function recursively to look at the arguments to that call.

All tests, small and real-world, seem to work as intended (though we have no introduced new tests yet that specifically introduce cases where we should be invalid). Here is the commit: https://github.com/thomasrolinger/chapel/commit/2b4111400a06ac58fd256d6a2b316f94c8a037c2

Addressing (3):

We simply added a new function, isCallExprBinaryOperator(), that checks whether a given CallExpr is a known binary operator. We limit ourselves to + - * / % ** & | ^ << >> && || == != <= >= < >. If we detect such an operator, then we're OK. Otherwise, we do not support it and will not do the optimization.

This works correctly on the test cases, and I ensured it does catch the case when there is a function call. Here is the commit: https://github.com/thomasrolinger/chapel/commit/afef79ffb28027398533fe63755a1ba97bcf18df

Addressing (4):

When we are looking at a CallExpr in findCandidatesIE(), we first check that its parent is a CallExpr, and then that the parent has only 1 actual and we can get a call-base for both the parent and the call. That allows us to find things like A[B[expr]], but not A[B[expr] + 1].

The full AST for A[B[i]+1] is as follows:

(185668 CallExpr
   (414305 SymExpr 'unknown A[185585]:_unknown[54]')
   (185669 CallExpr
      (185670 UnresolvedSymExpr '+')
      (185672 CallExpr
         (414306 SymExpr 'unknown B[185595]:_unknown[54]')
         (414307 SymExpr 'unknown i[185655]:_unknown[54]'))
      (185674 SymExpr 1 'val _literal_33[4974]:int(64)[13]'))))))

First, the check for numActuals() == 1 will prevent us from moving forward since we have 2 actuals in this case. Second, assuming that call is B[i], then parentCall will be the + call; we cannot get a call base for that.

What we need to do is ensure that we do have B[expr] as call (i.e., we can get its call base). If we have that, then we look up, always ensuring we have a CallExpr that is a valid binary operator (via isCallExprBinaryOperator()), and stop once we find a CallExpr that we can get the call base for. We stop our search if we hit a CallExpr that is not a valid binary operator (like an assignment or +=). In that case, we don't have a valid candidate.

Using the above approach, we should end up with call set to B[i] and parentCall set to A. We then pass in both call and parentCall to isLoopDomainValidIE() and do an additional call to callHasSymArgumentsIE() for parentCall. We need to also pass in call to this so we can ignore it when we come across it as a CallExpr (it would be considered invalid w.r.t. our checks).

With the above implemented, it seems we can correctly support A[B[expr] op expr] and all tests pass: https://github.com/thomasrolinger/chapel/commit/99ed488f779dd5737782feb80de2329bcf2ad583

Addressing (5):

We want to build up a vector of Symbols to use in callHasSymArgumentsIE() when checking for non-immediates and non-CallExprs in our expressions. These symbols will be from any loops "above" the A[B[i]] access. For example, consider:

for i in A {
   forall j in B {
      for k in C {
         ...A[B[i]]...
      }
   }
}

In this case, we want to add i, j and k to the vector of symbols. For now, this only considers loops that are "local" to the A[B[i]] access (i.e., we do not go across function calls).

So in the analyzeCandidateIE() function, we add a call to a new function, extractAllLoopSyms(), that will start at call and find any for or forall loop that is above it. For each loop we find, we extract out the loop index symbols and add them to a vector. But we only do this if the particular loop index variable is yielded by a "valid" iterator. This means it must be something that could potentially be an array or domain (i.e., not a range like j in 0..#3). We also add the loop iterator symbol and its type to a vector, as we'll want that info during prefolding to ensure the variable is yielded by an array or domain. By "type", I mean whether it is from a stand-alone array/domain, a dot-domain, or something like foo.field. We do the same kind of checks for another symbol already.

One thing that we had to adjust is that the original intent of callHasSymArguments() was to set the relevant index for the symbol used in the access. Since we allow for more complex accesses, this no longer is a thing. The issue is that the index is used later on in isLoopDomainValidIE() to check out things regarding the loop iterator. For now, we just assume that we only have 1 iterator, so the index is always 0.

To make things easier during prefolding (and more flexible), we will only pass on loop iterator symbols to prefolding if the corresponding loop index variable was actually used in the expression(s) of interest. There's no sense in checking/requiring things about something that is not relevant for the irregular access. To do this, we build up a std::map of the loop iterator symbols that correspond to loop index symbols actually used. That is the only "thing" we will carry with us forward when creating the primitive.

Also, I changed the name callHasSymArgumentsIE() to callHasValidExpression().

The next step is to pass in all of the relevant loop iterator symbols into the MAYBE_IRREG_ACCESS primitive. When we get to prefolding, we'll ensure those symbols are either arrays or domains. If one of them isn't, we abort the optimization.

Following that, we'll want to hand off those symbols to post resolution and track changes to them. They can be written to anywhere outside of the outer loop (i.e., same conditions we enforce for B and its domain).

Here is the commit for all of the changes mentioned above: https://github.com/thomasrolinger/chapel/commit/ddb6a39e467d3e31c1a0800d9a3b264ebc335506

The final thing we can do is leverage the analyzeCandiateIE() approach for analyzeCandidateIWA(). This is actually very easy because we have less restrictions in place. We can have function calls/etc., since we don't have to worry about the inspector. In the end, we just requiring call to have a call-base, and that to be nested in a CallExpr that also has a call-base.

Here is the commit that does that: https://github.com/thomasrolinger/chapel/commit/46d992db767a8cfc8fbd1f5322d7e8f15091c0f7

And we've merged the non-affine-analysis branch into main: https://github.com/thomasrolinger/chapel/commit/5f7f83ab64f12694443745b4dede04dacca1d5a9

thomasrolinger / chapel

Allow for basic affine expression in irregular access #37