vetter / shoc

The SHOC Benchmark Suite
Other
243 stars 104 forks source link

MaxFlops' performance on multiple GPUs #41

Closed bald34 closed 9 years ago

bald34 commented 10 years ago

Hello. I've compiled SHOC 1.1.5 with CUDA/OpenCL/MPI support under CentOS 6.5 with CUDA 6.5, Intel Compiler 11.1, Intel MKL 11.1 and OpenMPI 1.8.1 installed. PC has 4pcs NVidia Tesla K20m. When I try to run it using all GPUs, MaxFlops' performance is the same as if i run the test using one GPU only. This happens in both OpenCL and CUDA modes. I tried to change problem size from "-s 1" to "-s 4", but nothing change. Here below are console outputs:

[bald@node8 bin]$ ./shocdriver -cuda -s 1 -d 0 --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: node8.cluster Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 4 Device 0: 'Tesla K20m' Device 1: 'Tesla K20m' Device 2: 'Tesla K20m' Device 3: 'Tesla K20m' Specified 1 device IDs: 0 Using size class: 1

--- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 6.2430 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 6.6992 GB/sec Running benchmark MaxFlops result for maxspflops: 3099.6100 GFLOPS result for maxdpflops: 1164.3600 GFLOPS

[bald@node8 bin]$ ./shocdriver -cuda -s 1 -d 0,1,2,3 --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: node8.cluster Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 4 Device 0: 'Tesla K20m' Device 1: 'Tesla K20m' Device 2: 'Tesla K20m' Device 3: 'Tesla K20m' Specified 4 device IDs: 0,1,2,3 Using size class: 1

--- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 6.1165 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 6.6993 GB/sec Running benchmark MaxFlops result for maxspflops: 3099.1200 GFLOPS result for maxdpflops: 1165.0200 GFLOPS

But, as nvidia-smi said, all of the GPUs were almost loaded by MaxFlops application: [root@node8 ~]# nvidia-smi Thu Sep 4 16:29:27 2014 +------------------------------------------------------+ | NVIDIA-SMI 340.29 Driver Version: 340.29 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla K20m Off | 0000:02:00.0 Off | 0 | | N/A 37C P0 150W / 225W | 96MiB / 4799MiB | 91% Default | +-------------------------------+----------------------+----------------------+ | 1 Tesla K20m Off | 0000:03:00.0 Off | 0 | | N/A 38C P0 149W / 225W | 96MiB / 4799MiB | 90% Default | +-------------------------------+----------------------+----------------------+ | 2 Tesla K20m Off | 0000:81:00.0 Off | 0 | | N/A 36C P0 153W / 225W | 96MiB / 4799MiB | 91% Default | +-------------------------------+----------------------+----------------------+ | 3 Tesla K20m Off | 0000:82:00.0 Off | 0 | | N/A 38C P0 117W / 225W | 96MiB / 4799MiB | 91% Default | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Compute processes: GPU Memory | | GPU PID Process name Usage | |=============================================================================| | 0 2576 /home/bald/Downloads/shoc/bin/EP/CUDA/MaxFlops 80MiB | | 1 2577 /home/bald/Downloads/shoc/bin/EP/CUDA/MaxFlops 80MiB | | 2 2578 /home/bald/Downloads/shoc/bin/EP/CUDA/MaxFlops 80MiB | | 3 2579 /home/bald/Downloads/shoc/bin/EP/CUDA/MaxFlops 80MiB | +-----------------------------------------------------------------------------+

I want to see overall performance of my hybrid PC. What did I wrong? Is this behavior normal to SHOC? Thank you.

rothpc commented 9 years ago

I think the shocdriver script works differently than you are expecting. The intent is that it reports the best observed MaxFlop value from any device in the system, not the sum of the MaxFlop values across all devices in the system.

That said, I believe the findanymean function in the shocdriver script is incorrectly identifying which number to report. findanymean is only used when reporting the MaxFlops, so it only affects the reporting of that metric's value. We're investigating how to fix this.

rothpc commented 9 years ago

Commit cae885b2 fixed the incorrect shocdriver script results for MaxFlops (and for QTC) running with multiple devices.

A correction to my comment from yesterday - the shocdriver script reports the mean value of the best MaxFlops values for each device in the system. So one can multiply the reported value by the number of devices the driver was run on and obtain the total measured MaxFlops for the devices used.