vetter / shoc

The SHOC Benchmark Suite
Other
243 stars 104 forks source link

Why do some benchmarks not show a speedup when running on multiple devices? #55

Open gareth-ferneyhough opened 7 years ago

gareth-ferneyhough commented 7 years ago

I am expecting to observe a speedup when I run either an EP or TP benchmark on multiple devices, but that is not the case. The Stencil2D benchmark does show a speedup when I use multiple devices: ./shocdriver -d 0 -cuda -s 4 -benchmark Stencil2D result for stencil: 141.2280 GFLOPS vs. ./shocdriver -d 0,1,2,3 -cuda -s 4 -benchmark Stencil2D result for stencil: 406.1190 GFLOPS

However, this is the only benchmark I have found (so far) that shows a speedup. For example: ./shocdriver -d 0 -cuda -s 4 -benchmark Scan result for scan: 46.8924 GB/s vs ./shocdriver -d 0,1,2,3 -cuda -s 4 -benchmark Scan result for scan: 46.8561 GB/s Similarly, Reduction and GEMM show no improvement either. Am I missing something here? I am running version 1.1.5

cponder commented 3 years ago

I see increased performance with the MaxFlops & QTC benchmarks.

In my runs, at least, the Stencil2D GFLOPS metric holds steady and so do the Scan cases. It may be the case that the runs on each GPU are done in sequence, so the time increases in proportion to the number of GPUs and the time-normalized performance metrics average-out the same.