stfc / PSycloneBench

Various benchmarks used to inform PSyclone optimisations
BSD 3-Clause "New" or "Revised" License
6 stars 5 forks source link

(closes #91) Update Openmp GPU and add explicit memory versions to ACC #93

Closed sergisiso closed 1 year ago

sergisiso commented 1 year ago

@arporter I am bringing the OpenMP and OpenACC versions of NemoLite2D and tracer_advection to work with the latest PSyclone changes (including explicit memory management version).

Doing this I realized that the tra_adv acc_kernels_trans also expands loops and adds explicit loop parallelism. This is great, I think this will be the fastest version and what we should aim for NEMO. But I am unsure about the naming because the name gives the impression of only adding ACC kernel directives.

What do you think of breaking it in 2 scripts, and again by 2 for the data model so we have: acc_kernels_managed_memory_trans.py acc_kernels_explicit_memory_trans.py acc_mixed_managed_memory_trans.py acc_mixed_explicit_memory_trans.py acc_loops_managed_memory_trans.py acc_loops_explicit_memory_trans.py

the Kernels+loop expansion and explicit loop parallelism would be the acc_mixed.

I find this easier to compile and test all of them (instead of parameters inside the script) and the common functionality can be placed at the utils.py.

I will do it in this PR if you agree.

arporter commented 1 year ago

I will do it in this PR if you agree.

That sounds great to me :-)

sergisiso commented 1 year ago

Ok, the picture that I am getting is that for this benchmark kernels is slightly faster than explicit loops and explicit loops do not improve the kernels in the mixed version. Also excplicit data movement makes the kernels slighlty faster but not the explicit loops. OpenMP is behind OpenACC, even with the version that looks the same. image

These are all very small differences, if compared with CPU they are all much faster. image

sergisiso commented 1 year ago

@arporter This is ready to review. In addition to the tracer_advection updates I also introduced a NemoLite2D OpenMP offloading script. It still doesn't work (like the ACC equivalent?) but the psyclone part and the compilation does and it is convenient to have it repo to to try it in different machines/compilers.

sergisiso commented 1 year ago

@arporter This is ready for another look. As suggested I split the NemoLite2D part into another PR. The confusion/replication I previously had in the OpenACC mixed versions were because I was trying to conserve the previous code. But it is much nicer using the utils.py (and has better collapsing clauses) so I updated all scripts to use utils.py now.

sergisiso commented 1 year ago

@arporter This is ready to review again, previous comments are addressed and I leave further optimizations to a separate PR.

sergisiso commented 1 year ago

@arporter Just a reminder that this is ready for another review :)

sergisiso commented 1 year ago

@arporter Sorry for taking a long time to response to this but this should be ready for another review.

(In reality I didn't made any change because the NemoLite2D changes are now in another branch and the pylint issue I couldn't reproduce in my environment.)

LonelyCat124 commented 1 year ago

I'm using this branch for CPU OpenMP parallelisation of tracer advection and it hits https://github.com/stfc/PSyclone/issues/2101 as well for me even on this branch - probably not an issue of this branch but worth bearing in mind.