hiker commented 5 years ago

The Fuse transformation does not take a potential stencil operation into account. Since now we have stencil information for gocean, this can be tested. Example:

  do j
     do i
        f1(i, j, a)    ! Assume a pointwise write kernel for a
     enddo
  enddo
  do j
     do i
        f2(i, j, a)    ! Assume a (010, 010, 010) read-write stencil access for a
     enddo
  enddo

In f2 when computing a(i,j) the indices a(i, j+-1) are used, which was defined in f1. But if the outer loops are fused:

  do j
     do i
        f1(i, j, a)    ! Assume a pointwise write kernel for a
     enddo
     do i
        f2(i, j, a)    ! Assume a (010, 010, 010) read-write stencil access for a
     enddo
  enddo

when now computing a(i,j), again a(i, j+-1) is read. But now a(i,j+1) has not yet been computed in f1 when the value is needed in f2.

Two solutions: 1) we abort the fuse transformation if the above conditions is detected - though needs a careful look at the dependencies. 2) we do some kind of 'loop peeling', which also needs a close look at the dependencies. If required, we could do:

  ! Pre-compute two rows
  do j=1, 2
     do i
        f1(i, j, a)    ! Assume a pointwise kernel for a
     enddo
  enddo

  do j
    if(j+2 <= j_max) then
       do i
            f1(i, j+2, a)    ! Assume a pointwise kernel for a
       enddo
     endif
     do i
          f2(i, j, a)    ! Assume a (010, 010, 010) stencil access for a --> now a(i, j+-1) are defined
     enddo
  enddo

Or instead of the if test we could run the second j loop from 1 to jmax-2, and have another peeled loop for f2 at the end. The problem with this loop peeling is that it might not work well when parallelising the loop: we might not have enough iteration in the first peeled loop if we parallelise along j, or if we parallelise over i instead of j, we might access data on a remote socket if otherwise parallelisation is along j.

I'm happy to look at this next year.

hiker commented 8 months ago

In #2489 some improvements are made to detect invalid loop fusion configuration. As indicated by: https://github.com/stfc/PSyclone/pull/2489#discussion_r1474362325

These tests are actually skipped in case of a GoLoop (a test for PSyLoop is done before, and these tests are only executed if it is not a PSyLoop).

Unfortunately, we run into a bug in VariableAccesses in case of a kernel. In gocean1p0.py:

  def _record_stencil_accesses(self, signature, arg, var_accesses):
        for j in [-1, 0, 1]:
            for i in [-1, 0, 1]:
                depth = arg.stencil.depth(i, j)
                for current_depth in range(1, depth+1):
                    i_expr = GOKern._format_access("i", i, current_depth)
                    j_expr = GOKern._format_access("j", j, current_depth)
                    var_accesses.add_access(signature, arg.access,
                                            self, [i_expr, j_expr])

This adds the indices as strings, but they should be PSyIR nodes (Reference). If we have strings here, the dependency analysis will crash in the partition function:

    partitions = dep_tools._partition(comp_1, comp_other,
      File "/home/joerg/work/psyclone/src/psyclone/psyir/tools/dependency_tools.py", line 225, in _partition
    indices_1 = comp_ind1.get_subscripts_of(set_of_loop_vars)
      File "/home/joerg/work/psyclone/src/psyclone/core/component_indices.py", line 187, in get_subscripts_of
    index_vars = VariablesAccessInfo(indx)
      File "/home/joerg/work/psyclone/src/psyclone/core/variables_access_info.py", line 126, in __init__
    raise InternalError(f"Error in VariablesAccessInfo. "
    psyclone.errors.InternalError: PSyclone internal error: Error in VariablesAccessInfo. Argument must be a single Node in a schedule or a list of Nodes in a schedule but have been passed an object of type: <class 'str'>
}

[ ] Create Reference for these array accesses
[ ] Verify that array indices are PSyIR nodes
[ ] Fix the loop_fuse_trans test to execute the tests even in case of having PSyLoops.

arporter commented 3 days ago

2707 is on now. Can this be closed @hiker?

stfc / PSyclone

Loop Fusion Transform can create incorrect code #257

2707 is on now. Can this be closed @hiker?