Closed arporter closed 2 years ago
OpenMP version runs on my desktop but doesn't show any performance benefit. However, OMP is not in the ESIWACE2 deliverable so I'm going to park that for now.
I need to add a checksum output to ease verification.
Now have version with compute moved to subroutine working on GPU. However, can see that we get managed-memory traffic at the start of each compute region:
I think this must be because a lot of the work arrays are done as automatic arrays and thus are re-allocated on the GPU each time the subroutine is called.
Made the automatic arrays into module-scoped allocatables that are allocated just once:
Presumably @rupertford, this solution won't work for SIR because I now have an allocate in the compute routine itself? I could move it out to an init
method for the module?
Presumably @rupertford, this solution won't work for SIR because I now have an allocate in the compute routine itself? I could move it out to an
init
method for the module?
I'm not actually sure. It may be OK as you can specify data as being local in the SIR which presumably means scoped within the code generated by SIR. But I've not looked at what gets generated.
It would be good if the CI installed PSyclone and then also built those targets that use it but that's something for another PR.
This is ready for a first review now. Probably one for @rupertford.
Ready for another look from @rupertford now.
OK, this should be ready to go now.