Closed hiker closed 23 hours ago
In #2489 some improvements are made to detect invalid loop fusion configuration. As indicated by: https://github.com/stfc/PSyclone/pull/2489#discussion_r1474362325
These tests are actually skipped in case of a GoLoop (a test for PSyLoop is done before, and these tests are only executed if it is not a PSyLoop).
Unfortunately, we run into a bug in VariableAccesses in case of a kernel. In gocean1p0.py:
def _record_stencil_accesses(self, signature, arg, var_accesses):
for j in [-1, 0, 1]:
for i in [-1, 0, 1]:
depth = arg.stencil.depth(i, j)
for current_depth in range(1, depth+1):
i_expr = GOKern._format_access("i", i, current_depth)
j_expr = GOKern._format_access("j", j, current_depth)
var_accesses.add_access(signature, arg.access,
self, [i_expr, j_expr])
This adds the indices as strings, but they should be PSyIR nodes (Reference
). If we have strings here, the dependency analysis will crash in the partition function:
partitions = dep_tools._partition(comp_1, comp_other,
File "/home/joerg/work/psyclone/src/psyclone/psyir/tools/dependency_tools.py", line 225, in _partition
indices_1 = comp_ind1.get_subscripts_of(set_of_loop_vars)
File "/home/joerg/work/psyclone/src/psyclone/core/component_indices.py", line 187, in get_subscripts_of
index_vars = VariablesAccessInfo(indx)
File "/home/joerg/work/psyclone/src/psyclone/core/variables_access_info.py", line 126, in __init__
raise InternalError(f"Error in VariablesAccessInfo. "
psyclone.errors.InternalError: PSyclone internal error: Error in VariablesAccessInfo. Argument must be a single Node in a schedule or a list of Nodes in a schedule but have been passed an object of type: <class 'str'>
}
The Fuse transformation does not take a potential stencil operation into account. Since now we have stencil information for gocean, this can be tested. Example:
In f2 when computing a(i,j) the indices a(i, j+-1) are used, which was defined in f1. But if the outer loops are fused:
when now computing a(i,j), again a(i, j+-1) is read. But now a(i,j+1) has not yet been computed in f1 when the value is needed in f2.
Two solutions: 1) we abort the fuse transformation if the above conditions is detected - though needs a careful look at the dependencies. 2) we do some kind of 'loop peeling', which also needs a close look at the dependencies. If required, we could do:
Or instead of the if test we could run the second j loop from 1 to jmax-2, and have another peeled loop for f2 at the end. The problem with this loop peeling is that it might not work well when parallelising the loop: we might not have enough iteration in the first peeled loop if we parallelise along j, or if we parallelise over i instead of j, we might access data on a remote socket if otherwise parallelisation is along j.
I'm happy to look at this next year.