stfc / PSycloneBench

Various benchmarks used to inform PSyclone optimisations
BSD 3-Clause "New" or "Revised" License
6 stars 5 forks source link

NemoLite2D accelerated versions provide their own data synchronization method. #57

Closed sergisiso closed 4 years ago

sergisiso commented 4 years ago

This allows to avoid doing extra copies in the C OpenCL and Kokkos manual implementations. Which brings their execution time down significantly. Fixes #56 and I think we can also close #31 when this is merged. Since it brings dl_esm_inf to the latest version it also fixes #52 .

It also works with the Fortran OpenCL and the OpenACC versions, in this case the performance remains the same.

image

Additionally, FortCL is now a direct submodule of PSycloneBench (instead of dl_esm_inf) and most manual implementations can avoid compiling it (or its stubs) if they are not using the library.

This PR needs:

Finally, this PR breaks the NemoLite2D psyclone-generated opencl and openacc (already broken), but the fix needs to be implemented in PSyclone and not in this repository.

sergisiso commented 4 years ago

@arporter This PR is ready for review. It updates dl_esm_inf to the latest version and does the necessary changes to relocate FortCL and to use the new dl_esm_inf device infrastructure in all the versions that use them.

sergisiso commented 4 years ago

I've not actually tested any of the new code because I don't think I have a working OpenCL or Kokkos installation (unless the OCL on my desktop is still OK). Is there a relatively straightforward way to get either of these do you know?

@aporter I think the easiest way to test the correctness for OpenCL is installing the POCL (Portable OpenCL) implementation which is available in the Ubuntu and CentOs package managers. After installing the package everything should already work.

For kokkos, the Kokkos source should be available in the system. Do you think it would make sense to put kokkos as a submodule to make the process straightforward? This will allow it to be part of the compilation testing in travis.

In fact I could try to make both part of the travis execution to have them tested from now on.

sergisiso commented 4 years ago

@arporter I addressed the comments and also made the NemoLite2D OpenCL and Kokkos compilations (not the execution) part of Travis check. Would it make sense to let Travis do a small test case with known results as part of the check (maybe another PR)?

sergisiso commented 4 years ago

@arporter This is ready for the next review, note that I have also introduced a TILE parameter in the Kokkos implementation to squeeze a little more performance out of it.