thomasrolinger / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
0 stars 1 forks source link

Task: Handle multiple optimizations #18

Closed thomasrolinger closed 2 years ago

thomasrolinger commented 2 years ago

If a given A is used in the same forall but with a different B each time, we will have issues because we'll be using the wrong communication schedule. Consider

func(A, B);
func(A, C);
func(A, B);

The first call will set up A.replArray according to B and then the second call will wipe that out and set it up according to C. But the third call will not detect that the inspector needs to run, so it'll be using the communication scheduled based on C.

We can resolve this by including a communication schedule (i.e., a ReplicatedElems field) for each unique call site. This is similar to what we do for the inspector flag hashing. Essentially, BlockArr will store an associative array of ReplicatedElems where we use a hash of the call site arguments to determine which record we use.

The other issue is that A may be used in two completely different forall loops, either with the same B or a different one. In that case, we also need a different communication schedule (ReplicatedElems) to deal with that. And that instance also needs to be an associative array like we discussed above.

So in the end, we're looking at something like:

// somewhere "global"
const D_ie : domain(int) = {0..#MAX_IE_OPTS};

// in BlockArr
var commSchedules : [D_ie] CommunicationSchedule;

// defined in some module of ours
record CommunicationSchedule {
   // fields for eltType, idxType, rank
   var D : domain(int);
   var scheds : [D] ReplicatedElems(...);
   var inspectorFlag : bool;
}

The question is how to associate an ID/hash with a given forall so that when we encounter the forall with some A, we know which communication schedule we are getting. One approach is to create something const forallID = .. and insert that before the forall. The compiler has an internal ID that it increases each time it encounters a candidate forall. We then pass that ID into our various calls.

NOTE: before we enter the forall, we'll want to pull out the ReplicatedElems instance so we can just pass that into inspectAccess and executeAccess, as doing the array indirection each time we call those will be very expensive. So something like:

const forallID = ....;
ref replArray = A.commSchedules[forallID].scheds[(B._instance.hash()):int];

and we can wrap that into a function like getCommunicationSchedule(A, B, forallID).

We also may want to adjust our naming a bit too, to make things consistent. "Communication schedules" seems like a good term to use. So maybe instead of "replArray" we'll use "commSched".

This shouldn't be too hard to do, and we can put it together once we finish our current experiments and code refactoring.

thomasrolinger commented 2 years ago

Addressed: https://github.com/thomasrolinger/chapel/commit/2ef016858566b51ad5fb87ce42cdc34e18d045ea

Not too much had to be changed here, it went more or less like how we described it above.

There was one particular tricky thing to get around. We now have the commSchedules field, which is an array of CommunicationSchedules, which is a record with an array of ReplicatedElems. When compiling NAS-CG, we got a compiler error related to the sorting we do to generate the matrix. This sorting creates a block distributed array where the elements are lists. This caused an issue with copy-initialization, as the CommunicationSchedule with that element type did not support copy initialization (according to the compiler; I do not know much more about why that is the case). It seems to be specifically tied to the scheds field, which is the array of ReplicatedElems records.

To get around this, I defined my own init= method for CommunicationSchedule and left out copying over the scheds field, which is what was ultimately causing the error.

I thought it would be "bad" to just leave that out, but I think it is OK. The only thing that should be using these schedules/records is our optimization. Furthermore, copy initialization happens in something like var A = otherArray, but we would flag this as invalid since we cannot find A's domain. So I don't think there is a case where our custom copy initialization would impact our optimization.

Note that something like A = otherArray only copies over the values of otherArray into A. So there is no harm there either.

We added two new test cases, one that tests multiple call sites where we use A multiple times and one that has multiple forall loops that A is used in: ie_test_multipleCallSites2.chpl and ie_test_MULTI-FUNC_sameArray.chpl