stfc / PSyclone

Domain-specific compiler and code transformation system for Finite Difference/Volume/Element Earth-system models in Fortran
BSD 3-Clause "New" or "Revised" License
107 stars 29 forks source link

OpenACC kernels transformation for stpctl.f90 #2601

Open nmnobre opened 5 months ago

nmnobre commented 5 months ago

Hey @sergisiso,

With nvfortran 24.5, compilation fails with:

NVFORTRAN-S-1000-Call in OpenACC region to procedure 'nf90_enddef' which has no acc routine information (.../nemo-4.0_mirror_SI3_GPU/cfgs/SPITZ12_OCE_GPU/WORK/stpctl.f90: 91)

Inspecting stpctl.f90, you can see that we're placing a netCDF call inside a kernels construct:

     91         !$acc kernels
     92         istatus = nf90_enddef(idrun)
     93         zmax(8:) = 0._wp
     94         !$acc end kernels

This isn't a problem with nvfortran 23.7 which, unfortunately, we're stuck with for CI until we do something about https://github.com/stfc/PSyclone/issues/1596.

sergisiso commented 5 months ago

How does this succeed for 23.7? 🤔

Regarding the issue, I need to see the symbol type of nf90_enddef, but often the problem in these expressions is that if we don't see the declaration and we don't know if they are functions or arrays. And we tend to choose arrays, so the acc kernel sees nothing wrong.

One solution would be to be more conservative and make them codeblocks, but unfortunately this affects lots of code that we do process fine. Another is let the resolve_imports correct the cases where we were wrong (and provide netcdf in the -I include path of the psyclone command). Or the simplest is to search for the "nf90_enddef" symbol inside acc regions and not allow it, but this will still fail next time that we encounter a similar issue.

arporter commented 4 months ago

Even simpler, you can extend the script section that records the names of various functions and then we refuse to put them in regions.

nmnobre commented 4 months ago

Even simpler, you can extend the script section that records the names of various functions and then we refuse to put them in regions.

...as suggested over lunch last week :) I just didn't have the time to get to it, thanks for the reminder! Unless you are offering to take this over?

arporter commented 4 months ago

...as suggested over lunch last week :) I just didn't have the time to get to it, thanks for the reminder! Unless you are offering to take this over?

No, it was just that I happened on this Issue while searching for another one. I don't recall the lunchtime conversation! There's no rush. Actually, @sergisiso was saying to me yesterday that he thinks he can make the OpenACC nemo4 version go faster so maybe he could do this when he does that?

sergisiso commented 4 months ago

I was talking about the ACC loops and OpenMP version. This is an issue for the kernels which is not related as its not inside a loop.

sergisiso commented 4 months ago

Oh but this is a one-line change? We can do it in any PR that is coming soon?

sergisiso commented 4 months ago

Updating the compiler has made the NEMOv5 results different in the 11th digit:

it :      10    |ssh|_max:0.6483453724938688
it :      10    |ssh|_max:0.6483453724882898

I guess this is expected and we should also introduce a comparison tolerance (as we do for NEMOv4)

arporter commented 4 months ago

I was deliberately not using a tolerance to make this test more sensitive (although I had to back-off to -O1). For passthrough on CPU, I'd prefer to avoid a tolerance. I think a change like this when changing compiler version is fine.

sergisiso commented 4 months ago

NEMOv5 passthrough and NEMOv4 OpenMP offload succeeded but the NEMOv4 OpenACC kernels produced a:

cd /home/gh_runner/NEMO/cfgs/SPITZ12_openacc_kernels/EXP00; ./nemo
make: *** [Makefile:104: run-openacc_kernels] Segmentation fault (core dumped)

(https://github.com/stfc/PSyclone-mirror/actions/runs/9682071608/job/26714200597) Unless you know what could it be, we will need to repeat it with the debugger to see where it fails.

arporter commented 4 months ago

Unless you know what could it be, we will need to repeat it with the debugger to see where it fails.

I'm afraid I have no idea. Perhaps we should add the necessary env. variables to the workflow so that we get a stacktrace in the event of a crash? (So long as they don't harm performance.)