Closed roiser closed 1 year ago
(So will keep this ticket open for now, and make further PRs to address proposals in comment above).
As suggested by @roiser this can now be closed - we have an initial Alpaka implementation, then we'll see later how to make this evolve. Thanks @smithdh !
I've merged epoch1/alpaka code based on eemumu_AV alpaka_hw-abstraction-base-v0. However I think there is still some work to do: Currently the makefile is only setup to build a binary using OpenMP as backed. I think a CUDA build should also be setup to easily, possibly replacing the Makefiles with cmake and add automatic discover of CUDA, building it if available.
Currently the OpenMP version appears to produce a consistent (with original c++ version, random numbers are different) mean matrix element value, although performance appears to be low:
CPU, 32 hw threads with hyperthreading, CPU E5-2630 v3 @ 2.40GHz