stfc / PSycloneBench

Various benchmarks used to inform PSyclone optimisations
BSD 3-Clause "New" or "Revised" License
6 stars 5 forks source link

Auto-generated OpenACC and OpenMP #12

Open arporter opened 6 years ago

arporter commented 6 years ago

In PSyclone issue 170 (https://github.com/stfc/PSyclone/issues/170) we are adding support for OpenACC to the GOcean 1.0 API. The NEMOLite2D application in PSycloneBench requires some updating in order for this to work and we shall do that here.

arporter commented 6 years ago

Auto-generation of the OpenACC version of NEMOLite2D is now working.

arporter commented 5 years ago

I've updated the OpenACC transformation script so that it now adds the !$acc routine directive to every kernel. I've also updated the Makefile to ensure that it builds all of the generated kernels. This was difficult and will have broken it for all the other targets as I've altered the way we get the list of kernel names. The generated code compiles without OpenACC enabled. However, when it is enabled the code will not compile because some of the kernels use modules from the GOcean infrastructure (in order to get some parameter values).

arporter commented 5 years ago

Compiling the manual OpenACC version with 18.4 of PGI at -O2 fails with:

pgf90 -O2 -g -Minfo=all -acc -ta=tesla:cc70 -I../../../../../shared
/dl_esm_inf/finite_difference/src -I../../../../../shared/dl_timer/src -c initialisation_mod.f90
initialisation:
     27, Memory set idiom, loop replaced by call to __c_mset8
     28, Memory set idiom, loop replaced by call to __c_mset8
     29, Memory set idiom, loop replaced by call to __c_mset8
     32, Memory zero idiom, loop replaced by call to __c_mzero8
     37, FMA (fused multiply-add) instruction(s) generated
     49, FMA (fused multiply-add) instruction(s) generated
     59, Memory zero idiom, loop replaced by call to __c_mzero8
     62, Memory zero idiom, loop replaced by call to __c_mzero8
/tmp/pgf90ftFbpOhEjuSF.s: Assembler messages:
/tmp/pgf90ftFbpOhEjuSF.s:3549: Error: unsupported instruction `vmovd'
/tmp/pgf90ftFbpOhEjuSF.s:3599: Error: unsupported instruction `vmovd'
/tmp/pgf90ftFbpOhEjuSF.s:3649: Error: unsupported instruction `vmovd'
make[1]: *** [Makefile:70: initialisation_mod.o] Error 2

but compiling at -O1 works. According to pgfortran -help -O2, -O2 == -Mvect=sse -Mcache_align -Mpre so I tried doing -O2 -Mnovect -Mnocache_align -Mnopre but that didn't change the error. I was hoping to get -O2 working because I wanted to see whether IPA would remove the need for us to module-inline accelerated kernels. However, requesting IPA automatically ups the optimisation level to -O2 and the compiler falls over.

sergisiso commented 1 year ago

I will continue this old issue to psyclone-generate the OpenACC and OpenMP versions, which are both almost there.

arporter commented 5 months ago

As discussed, I've just tried turning OpenACC on for #100 and it very nearly works - the resulting Fortran doesn't compile though. The only reason for this is that the module-inlined versions of the bc_flather kernels still have a wildcard import from a module which brings in g which is now being passed as an argument to the kernel (thanks to KernelImportsToArguments). I've stepped through the latter transformation in the debugger and it does remove the Container symbol from the Kernel table so I don't understand how it appears in the generated Fortran.