stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
103 stars 28 forks source link

Support Parallel Array Privatisation #2671

Open sergisiso opened 1 month ago

sergisiso commented 1 month ago

Currently we always consider arrays shared on parallel loops, and never privatise them. For example in:

real(kind=wp), dimension(nlay_i) :: ztmp
do ji = 1, npti, 1
  ...
  do jk = 1, nlay_i, 1
     ztmp(jk) = ...
  enddo
  ... uses ztmp ...
end do

We could privatise ztmp in the outer loop. We don't but this is not a problem because the dependency analysis also correctly reports: The write access to 'ztmp' in 'ztmp(jk)' causes a write-write race condition. So we skip the outer loop and only parallelise the inner one.

However, it would be more efficient to parallelise the outer loop and mark ztmp as private.

There are three solutions to explore:

@arporter @hiker @LonelyCat124 Any ideas for this? This seems a pattern that will be important for NEMO5

sergisiso commented 1 month ago

Note that previous options are not mutually exclusive, (1) could be an option that calls (2) as a meta-transformation, and (3) can be some logic inside the ParallelLoop apply that also calls (2) when needed. So maybe implementing (2) -> (1) -> (3) is an iterative path to the solution.

arporter commented 1 month ago

I think that making arrays thread-private on a GPU which has a lot of threads often causes problems. The solution that has worked in WW3 and which Simon is using in NEMO5 is to increase the dimensionality of the array. Simon's solution is to do this for all arrays beginning with z as, in NEMO, that means they are temporaries. He has PSyclone code we could take a look at.

sergisiso commented 1 month ago

Thanks @arporter I forgot about this and they are almost equivalent (maybe the difference is that one uses the heap and the privatisation probably the stack)

One more note about notation:

sergisiso commented 1 month ago

ECMWF has hit the same issues in NEMO5. From a conversation we had, in zdftke.f90 there is:

! PSyclone: Loop cannot be parallelised because the dependency analysis reported:
! Error: The write access to 'zmxlm' in 'zmxlm(idx_2,idx_1)' causes a write-write race condition. Variable: 'zmxlm'.
! Error: The write access to 'zmxld' in 'zmxld(idx_4,idx_3)' causes a write-write race condition. Variable: 'zmxld'.
! Consider using the "ignore_dependencies_for" transformation option if this is a false dependency.
do jj = ntsj - 0, ntej + 0, 1

The reason for it is that in NEMO5 these temporary variables have been reduced to 2D: REAL(wp), DIMENSION(ntsi-(0):ntei+(0),jpk) :: zmxlm, zmxld

and therefore they are being re-used for each iteration of the jj loop and can not be used in parallel. There is 2 solutions for this:

sergisiso commented 1 month ago

To identify which arrays to privatise/localise/increase_dimensionality, we need to also check that they are not "read" after the loop (in addition to being write-write inside the loop). For this, the .next_access property would be useful. @LonelyCat124 is there a separate issue to track this functionality (or was this just discussed inside the scalarisation and async issues)?

LonelyCat124 commented 1 month ago

I think it was just discussed in those issues, we should probably have a standalone issue

sergisiso commented 1 month ago

Note that there is also threadprivate: https://www.openmp.org/spec-html/5.0/openmpsu104.html#x137-5470002.19.2

To summarise:

Which one is the best is context/compiler/hardware specific. Ideally, we want the psyclone-script writer to choose what they want to do.

Again, the increase_dims and private are not mutually exclusive because if the dimension is increased and indexed by the outer loop index there will be no write-write dependency (but use more memory). If the dimension is increased by a heap allocted "get_num_threads" we need to tell parallelisation transformation to ignore dependencies in that symbol. But importantly they don't trigger the privatisation anymore.