Open arporter opened 5 years ago
For OpenMP we automatically generate a list of private variables for each parallel region at code generation time. This works for a generated PSy layer because we know what the scalar variables are and that it is safe to parallelise the loop. For frontends that work with existing Fortran code (such as NEMO), we don't have this information. Even once we have dependence analysis, code like:
scalar = 1.0
DO ji = 1, jpi
IF(not_const)scalar = array(ji)
array2(ji) = const * array3(ji)
END DO
DO ji=1,jpi
IF(not_const)scalar = array4(ji)
array5(ji) = const * array3(ji)
END DO
(where not_const
is some loop-invariant boolean) is difficult to handle. What we really need is a 'manual override' that says (to PSyclone) that this loop is safe to parallelise. It's then up to PSyclone to persuade the compiler of that.
However, even that functionality requires (some) dependence/symbol information as we need to know which (if any) scalars are written to within a loop.
Related to this, Simon has reported:
"The use of variables zalph2 and z1_alph2 in two horizontal loops of the ice_dyn_rhg_evp subroutine of the icedyn_rhg_evp module appears to prevent the parallelisation of the first of these loop nests, with a substantial negative impact on the efficiency of the resulting accelerated executable. My current workaround is to create copies of zalph2 and z1_alph2 and use independent sets of these two variables in each of the two loops, respectively. Declaring zalph2 and z1_alph2 as "PRIVATE" in the "ACC LOOP" directive of the affected horizontal loop might enable parallelisation without code refactoring, but would require the initialisation of the private variables with the values the corresponding variables hold outside the parallelised loop nest. Does the "PRIVATE" clause of an "ACC LOOP" directive ensure initialisation, or would one have to embed it within an "ACC PARALLEL FIRSTPRIVATE(zalph2, z1_alph2)" directive? "
We now have the OMPPrivateClause
node. This needs generalising so that we support OpenACC too.
I second myself - we need to be able to mark arrays as private variables. (Although sometimes we'd be better off increasing their rank by one and indexing into them using the index of the loop we're parallelising.)
Sometimes a compiler is unable to determine whether or not a scalar variable that is written within a loop is then used after the loop. This prevents parallelisation of the loop. This can be worked around (if the variable value really isn't needed after the loop) by instructing the compiler to make such a variable private. The
ACCLoopTrans
transformation needs extending to support this.