stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
107 stars 29 forks source link

[nemo] support private clause on OpenACC Loop directive #423

Open arporter opened 5 years ago

arporter commented 5 years ago

Sometimes a compiler is unable to determine whether or not a scalar variable that is written within a loop is then used after the loop. This prevents parallelisation of the loop. This can be worked around (if the variable value really isn't needed after the loop) by instructing the compiler to make such a variable private. The ACCLoopTrans transformation needs extending to support this.

arporter commented 5 years ago

For OpenMP we automatically generate a list of private variables for each parallel region at code generation time. This works for a generated PSy layer because we know what the scalar variables are and that it is safe to parallelise the loop. For frontends that work with existing Fortran code (such as NEMO), we don't have this information. Even once we have dependence analysis, code like:

scalar = 1.0
DO ji = 1, jpi
  IF(not_const)scalar = array(ji)
  array2(ji) = const * array3(ji)
END DO
DO ji=1,jpi
  IF(not_const)scalar = array4(ji)
  array5(ji) = const * array3(ji)
END DO

(where not_const is some loop-invariant boolean) is difficult to handle. What we really need is a 'manual override' that says (to PSyclone) that this loop is safe to parallelise. It's then up to PSyclone to persuade the compiler of that.

arporter commented 5 years ago

However, even that functionality requires (some) dependence/symbol information as we need to know which (if any) scalars are written to within a loop.

arporter commented 3 years ago

Related to this, Simon has reported:

"The use of variables zalph2 and z1_alph2 in two horizontal loops of the ice_dyn_rhg_evp subroutine of the icedyn_rhg_evp module appears to prevent the parallelisation of the first of these loop nests, with a substantial negative impact on the efficiency of the resulting accelerated executable. My current workaround is to create copies of zalph2 and z1_alph2 and use independent sets of these two variables in each of the two loops, respectively. Declaring zalph2 and z1_alph2 as "PRIVATE" in the "ACC LOOP" directive of the affected horizontal loop might enable parallelisation without code refactoring, but would require the initialisation of the private variables with the values the corresponding variables hold outside the parallelised loop nest. Does the "PRIVATE" clause of an "ACC LOOP" directive ensure initialisation, or would one have to embed it within an "ACC PARALLEL FIRSTPRIVATE(zalph2, z1_alph2)" directive? "

arporter commented 1 year ago

We now have the OMPPrivateClause node. This needs generalising so that we support OpenACC too.

arporter commented 8 months ago

I second myself - we need to be able to mark arrays as private variables. (Although sometimes we'd be better off increasing their rank by one and indexing into them using the index of the loop we're parallelising.)