kokkos / kokkos-tutorials

Tutorials for the Kokkos C++ Performance Portability Programming Ecosystem
https://kokkos.org
Other
301 stars 100 forks source link

Add multi-gpu tutorial #88

Closed tcclevenger closed 6 months ago

tcclevenger commented 6 months ago

Add an exercise for muli gpu. Uses the yT*A*x example common in many other exercises.

The Begin/ part of the exercise can be run where the yTAx computation is done with 2 different sets of y,A,x views all on the same device. Then the Solution/ is to split the computation between 2 different devices, hopefully seeing a 2x speedup.

I ran on weaver (V100) and got

Begin:
  N( 10000 ) nrepeat ( 100 ) problem( 1600.32 MB ) time( 2.48559 s ) bandwidth( 128.768 GB/s )

Solution:
  N( 10000 ) nrepeat ( 100 ) problem( 1600.32 MB ) time( 1.25715 s ) bandwidth( 254.596 GB/s )