Open arporter opened 1 month ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.85%. Comparing base (
6aad1d4
) to head (eb4f8c8
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
The good news: the integration tests pass. Bad news: i.e. the performance of NEMO with OMP offload and OMP CPU threading has taken a big hit.
I've diffed the generated OMP code with that from master and the only differences are in a comment where we don't sort the order of 'impure' routine names. Perhaps the machine just happened to be busy? Oh, there are a couple of differences in some routine calls:
diff psycloned-openmp_cpu-master/icbdyn.f90 psycloned-openmp_cpu/icbdyn.f90
108c108
< pt%lon = icb_utl_bilin_x(glamt,pt%xi,pt%yj)
---
> pt%lon = icb_utl_bilin_x(glamt(:,:),pt%xi,pt%yj)
diff psycloned-openmp_cpu-master/icbrst.f90 psycloned-openmp_cpu/icbrst.f90
341c341
< nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3,nlngth3)
---
> nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3(:),nlngth3(:))
but they look reassuring (as in, something has actually changed) and shouldn't affect performance at all.
I re-ran the NEMO tests and got: so it may just be machine noise. I'll check the generated OMP-offload code against that from master too.
The only diffs between master and this branch in the generated OMP-offoad version of NEMO4 are:
$ diff psycloned-openmp_gpu{,-2027}
diff psycloned-openmp_gpu/icbdyn.f90 psycloned-openmp_gpu-2027/icbdyn.f90
111c111
< pt%lon = icb_utl_bilin_x(glamt,pt%xi,pt%yj)
---
> pt%lon = icb_utl_bilin_x(glamt(:,:),pt%xi,pt%yj)
diff psycloned-openmp_gpu/icbrst.f90 psycloned-openmp_gpu-2027/icbrst.f90
365c365
< nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3,nlngth3)
---
> nret = nf90_put_var(ncid,nsiceid,griddata,nstrt3(:),nlngth3(:))
Therefore I declare this ready for review. One for @sergisiso, @AidanChalk or @hiker.
I brought the PR to master in order to run the integration tests again (it needed to use updated modules). With an empty system this PR shows no performance degradation 👍
I am finding this same issue in lbclnk.f90 of NEMOv5 in my two open PRs, merging this branch into them fixes it.
Unfortunately, my tightening-up of checks on the types of expressions appearing in array-indices now means that the WHERE handling is refusing some code that is/was OK, e.g.:
WHERE( picefr(:,:) > 1.e-10 )
zevap_ice(:,:,1) = snow(:,3,:) * frcv(jpr_ievp)%z3(:,:,1) / picefr(:,:)
Here, we don't know the type of jpr_ievp
(because it is imported) and therefore we refuse to find the shape of frcv(jpr_ievp)%z3(:,:,1)
. However, we can see that the whole expression already contains two ranges (:
) and therefore we can conclude that jpr_ievp
must be a scalar. This is going to be hard to deal with and also will conflict with @LonelyCat124 's work in #2687. I therefore don't really want to touch it until that PR is on (unless I can find a simple fix).
@arporter The WHERE branch has been merged, so this PR can probably continue now. Also note that a "lbclnk.f90", # TODO #2685: effective shape bug
was added to examples/nemo/scripts/omp_cpu_trans.py as this was affecting NEMOv5. We want to make sure this PR resolves it and we can delete the exclusion.
Unfortunately, my tightening-up of checks on the types of expressions appearing in array-indices now means that the WHERE handling is refusing some code that is/was OK Here, we don't know the type of jpr_ievp (because it is imported) and therefore we refuse to find the shape of
By the way, @LonelyCat124 also tightened the WHEREs with imported symbols and I believe he had to bring "jpr_ievp" and other symbols as locals in some tests. We also seen in the integration tests that this did not affect NEMO performance. So, your issue may be resolved now.
@arporter The WHERE branch has been merged, so this PR can probably continue now. Also note that a
"lbclnk.f90", # TODO #2685: effective shape bug
was added to examples/nemo/scripts/omp_cpu_trans.py as this was affecting NEMOv5. We want to make sure this PR resolves it and we can delete the exclusion.
In what way did the bug manifest with this file? I've removed this exclusion and re-run the integration tests which were fine apart from the race condition affecting the OMP CPU results.
It was a pyclone exception while processing the file. I knew this branch resolved it because I tried merging it even before the first review and I saw it was fixed here, that's why I added the TODO. It should be good now.
I have some indirect coverage changes that I need to address.
We had a bug which would cause a crash if we had an array access like
a(b)
whereb
is actually an array.In testing this, I also realised that the ArrayAssignment2LoopsTrans transformation would incorrectly convert something like:
to
This should in fact be:
This also means we have a bug in
examples/nemo/scripts/utils.py
because it should have converteda(b)
toa(b(:))
.