stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
104 stars 27 forks source link

DependencyAnalysis misses dependency on scalar #2727

Open sergisiso opened 2 hours ago

sergisiso commented 2 hours ago

@hiker This is a simplified version of an issue we hit on NEMOv5, here my_val sometimes keeps its value for the next iteration, so it is essentially serialising the loop. Could we do something about this?

def test_new(fortran_reader):
    '''Tests can_loop_be_parallelised.'''
    source = '''program test
                integer :: i, my_val
                integer, dimension(10) :: array

                do i = 1, 10
                  if (array(i) > 3) then
                    my_val = array(i)
                    array(i) = my_val
                  else
                    array(i) = my_val
                  endif
                end do

                end program test'''

    psyir = fortran_reader.psyir_from_source(source)
    loop = psyir.children[0].children[0]
    dep_tools = DependencyTools()
    parallel = dep_tools.can_loop_be_parallelised(loop)
    assert parallel is False
hiker commented 2 hours ago

I don't see an easy way :( I tried to rewrite it as something like:

do i=1, 10
    if (array(i)<=3) then
      inner: do j= i, 1, -1
          if (array(j)>3) then
              array(i) = array(j)
              break  inner
          endif
     enddo inner    ! not sure what to do if there is no element >3 :)
enddo

I believe that should roughly be the same logic - for each element <=3 find the previous element >3

But I assume the dependency analysis will still see read and write accesses to the same index in different iterations (i and j), and there actually is a race condition (consider 4, 1, 1, in array... when the loop index 2 and 3 ... the 1s ... are executed in parallel, loop index 3 might read the value from loop 2, while loop 2 is writing ... ) except of course it doesn't matter, since the same value will be written in loop index 3 (the 4, either from array index 2 or array index 1).

But I think this rewrite should allow you to execute this in parallel (by disabling the DA). And of course, handling the problem what happens if the first element is <=3 ... my_val would be uninitialised, and the search loop won't find a value.

sergisiso commented 55 minutes ago

I would like to try to identify the dependency in the original code as this leads to compliable but code with different semantics (psyclone adds a private(my_var) since it finds the write-write), and it's very difficult to find out where the difference in results come from.

What make it hard? Is it the fact that it is a "scalar"? If I do something like:

    def test_new2(fortran_reader):
        '''Tests can_loop_be_parallelised.'''
        source = '''program test
                    integer :: i
                    integer, dimension(10) :: array, my_val
                    do i = 1, 10
                        if (array(i) > 3) then
                          my_val(1) = 1
                          array(i) = my_val(1)
                        else
                          array(i) = my_val(1)
                        endif
                    end do
                    end program test'''

        psyir = fortran_reader.psyir_from_source(source)
        loop = psyir.children[0].children[0]
        dep_tools = DependencyTools()

        parallel = dep_tools.can_loop_be_parallelised(loop)
        assert parallel is False

It properly detects the write-write dependency on my_val in the outer loop

sergisiso commented 42 minutes ago

In both cases, if in both condition bodies there was a my_val( = write (or my_val(1) =), they would be parallelisable with the [first]private(my_val).

Ignoring the scalar version for now, the problem seems to be that the write-write is not exhaustive to all branches, and therefore some are carried to the next iteration. Would it be possible to differentiate between: