Closed arporter closed 1 year ago
Performance of this version is currently not very good because some loops are left on the CPU due to complex loop bounds (involving UBOUND/LBOUND).
I was kind of expecting this I will review it this afternoon and when https://github.com/stfc/PSyclone/pull/1790 is merged this can be fixed. Looking at the source I think it should be feasible to bring everything to the GPU as it looks very much like parts on NEMO that I know are offloaded.
I hadn't noticed the checksum problem. It turns out that OMPTargetTrans doesn't exclude CodeBlock nodes (and it should) so the final loop that computes the checksum gets put in an omp target
block. Quite what the compiler does with that I'm not sure. For now I've extended the transformation script to exclude loops that contain CodeBlocks and now I get a non-zero checksum. I'll make a PSyclone issue for this.
Checksum now matches for serial and OMP offload versions. Ready for another look.
Ready for another look now.
Ready for review now. As suggested in #86, I've just copied in the script from PSyclone/examples/nemo/eg1 for now. This will need updating once the new transformations are on master.