stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
104 stars 27 forks source link

(Closes #2027) fix for ArrayMixin._effective_shape bug #2685

Open arporter opened 1 month ago

arporter commented 1 month ago

We had a bug which would cause a crash if we had an array access like a(b) where b is actually an array.

In testing this, I also realised that the ArrayAssignment2LoopsTrans transformation would incorrectly convert something like:

c(:) = a(b)

to

do idx = 1, SIZE(c,1)
   c(idx) = a(b)
end do

This should in fact be:

do idx = 1, SIZE(c,1)
  c(idx) = a(b(idx))
end do

This also means we have a bug in examples/nemo/scripts/utils.py because it should have converted a(b) to a(b(:)).

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.85%. Comparing base (6aad1d4) to head (eb4f8c8).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #2685 +/- ## ========================================== - Coverage 99.86% 99.85% -0.02% ========================================== Files 354 354 Lines 49112 49123 +11 ========================================== + Hits 49048 49052 +4 - Misses 64 71 +7 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

arporter commented 1 month ago

The good news: the integration tests pass. Bad news: image i.e. the performance of NEMO with OMP offload and OMP CPU threading has taken a big hit.

arporter commented 1 month ago

I've diffed the generated OMP code with that from master and the only differences are in a comment where we don't sort the order of 'impure' routine names. Perhaps the machine just happened to be busy? Oh, there are a couple of differences in some routine calls:

diff psycloned-openmp_cpu-master/icbdyn.f90 psycloned-openmp_cpu/icbdyn.f90
108c108
<       pt%lon = icb_utl_bilin_x(glamt,pt%xi,pt%yj)
---
>       pt%lon = icb_utl_bilin_x(glamt(:,:),pt%xi,pt%yj)
diff psycloned-openmp_cpu-master/icbrst.f90 psycloned-openmp_cpu/icbrst.f90
341c341
<         nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3,nlngth3)
---
>         nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3(:),nlngth3(:))

but they look reassuring (as in, something has actually changed) and shouldn't affect performance at all.

arporter commented 1 month ago

I re-ran the NEMO tests and got: image so it may just be machine noise. I'll check the generated OMP-offload code against that from master too.

arporter commented 1 month ago

The only diffs between master and this branch in the generated OMP-offoad version of NEMO4 are:

$ diff psycloned-openmp_gpu{,-2027}
diff psycloned-openmp_gpu/icbdyn.f90 psycloned-openmp_gpu-2027/icbdyn.f90
111c111
<       pt%lon = icb_utl_bilin_x(glamt,pt%xi,pt%yj)
---
>       pt%lon = icb_utl_bilin_x(glamt(:,:),pt%xi,pt%yj)
diff psycloned-openmp_gpu/icbrst.f90 psycloned-openmp_gpu-2027/icbrst.f90
365c365
<         nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3,nlngth3)
---
>         nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3(:),nlngth3(:))

Therefore I declare this ready for review. One for @sergisiso, @AidanChalk or @hiker.

sergisiso commented 1 month ago

I brought the PR to master in order to run the integration tests again (it needed to use updated modules). With an empty system this PR shows no performance degradation 👍

sergisiso commented 1 month ago

I am finding this same issue in lbclnk.f90 of NEMOv5 in my two open PRs, merging this branch into them fixes it.

arporter commented 1 month ago

Unfortunately, my tightening-up of checks on the types of expressions appearing in array-indices now means that the WHERE handling is refusing some code that is/was OK, e.g.:

  WHERE( picefr(:,:) > 1.e-10 )
    zevap_ice(:,:,1) = snow(:,3,:) * frcv(jpr_ievp)%z3(:,:,1) / picefr(:,:)

Here, we don't know the type of jpr_ievp (because it is imported) and therefore we refuse to find the shape of frcv(jpr_ievp)%z3(:,:,1). However, we can see that the whole expression already contains two ranges (:) and therefore we can conclude that jpr_ievp must be a scalar. This is going to be hard to deal with and also will conflict with @LonelyCat124 's work in #2687. I therefore don't really want to touch it until that PR is on (unless I can find a simple fix).

sergisiso commented 2 days ago

@arporter The WHERE branch has been merged, so this PR can probably continue now. Also note that a "lbclnk.f90", # TODO #2685: effective shape bug was added to examples/nemo/scripts/omp_cpu_trans.py as this was affecting NEMOv5. We want to make sure this PR resolves it and we can delete the exclusion.

sergisiso commented 2 days ago

Unfortunately, my tightening-up of checks on the types of expressions appearing in array-indices now means that the WHERE handling is refusing some code that is/was OK Here, we don't know the type of jpr_ievp (because it is imported) and therefore we refuse to find the shape of

By the way, @LonelyCat124 also tightened the WHEREs with imported symbols and I believe he had to bring "jpr_ievp" and other symbols as locals in some tests. We also seen in the integration tests that this did not affect NEMO performance. So, your issue may be resolved now.

arporter commented 1 day ago

@arporter The WHERE branch has been merged, so this PR can probably continue now. Also note that a "lbclnk.f90", # TODO #2685: effective shape bug was added to examples/nemo/scripts/omp_cpu_trans.py as this was affecting NEMOv5. We want to make sure this PR resolves it and we can delete the exclusion.

In what way did the bug manifest with this file? I've removed this exclusion and re-run the integration tests which were fine apart from the race condition affecting the OMP CPU results.

sergisiso commented 1 day ago

It was a pyclone exception while processing the file. I knew this branch resolved it because I tried merging it even before the first review and I saw it was fixed here, that's why I added the TODO. It should be good now.

arporter commented 1 hour ago

I have some indirect coverage changes that I need to address.