Performance of 0.17 on V100 (Google Compute): 2000 n/s vs 6000 n/s

ghost commented 5 years ago

On Facebook [1] someone mentioned getting an average of 6000 n/s on a V100 using network #220.

I set up just such an instance yesterday (6 vCPUs, 1 V100), using Ubuntu 18.04 and CUDA 10, and am "only" getting around 1750 n/s (without "-t"), or at most 2100 n/s (using "-t 16").

Here is the output of "./leelaz -w best-network.gz":

Using OpenCL batch size of 5 Using 10 thread(s). RNG seed: 11358463697549930105 Leela Zero 0.17 Copyright (C) 2017-2019 Gian-Carlo Pascutto and contributors This program comes with ABSOLUTELY NO WARRANTY. This is free software, and you are welcome to redistribute it under certain conditions; see the COPYING file for details.

BLAS Core: built-in Eigen 3.3.7 library. Detecting residual layers...v1...256 channels...40 blocks. Initializing OpenCL (autodetecting precision). Detected 1 OpenCL platforms. Platform version: OpenCL 1.2 CUDA 10.1.133 Platform profile: FULL_PROFILE Platform name: NVIDIA CUDA Platform vendor: NVIDIA Corporation Device ID: 0 Device name: Tesla V100-SXM2-16GB Device type: GPU Device vendor: NVIDIA Corporation Device driver: 418.56 Device speed: 1530 MHz Device cores: 80 CU Device score: 1112 Selected platform: NVIDIA CUDA Selected device: Tesla V100-SXM2-16GB with OpenCL 1.2 capability. Half precision compute support: No. Tensor Core support: Yes. OpenCL: using fp16/half or tensor core compute support. Loaded existing SGEMM tuning. Wavefront/Warp size: 32 Max workgroup size: 1024 Max workgroup dimensions: 1024 1024 64 Setting max tree size to 3736 MiB and cache size to 415 MiB.

Is there anything I could do to also get 6000 n/s? Do others get that performance as well?

[1] https://www.facebook.com/groups/go.igo.weiqi.baduk/permalink/10157283599366514/

nerai commented 5 years ago

You refer to n/s, which is an unreliable metric. A V100 should get something between 2k and 8k n/s, depending on the board and game.

It is better to also measure evals/s, which directly shows how fast the GPU is (compared to n/s, which describes the combination of CPU and GPU). A V100 with 0.17 is probably around 2k evals/s. As for the maximum possible, I will publish a couple tables about this in a few weeks.

ozymandias8 commented 5 years ago

My v100 on google cloud is averaging one game per 108 seconds, or 473 ms/move, over the last ~800 games. Not sure what that translates to in n/s but it is quite a bit faster than my local GTX 1060 6GB. THe script I'm using runs two games simultaneously, and the creator of the script claims it is significantly faster than running one game at a time.

nerai commented 5 years ago

@ozymandias8 It translates to 3400 n/s (which is in the expected range)

zhanzhenzhen commented 5 years ago

I get 1600 n/s on V100. Maybe their 6000 n/s is only for the first move, which is 4x faster because of symmetric things? Or maybe it's actually a 4-GPU machine?

lonemonkeywithwhiteshell commented 5 years ago

I'm sad that my v100 on google cloud is averaging 2700ms/move over the ~300 games. I don't know what to do...

ozymandias8 commented 5 years ago

Are you using the script from:

https://github.com/leela-zero/leela-zero/issues/1905

lonemonkeywithwhiteshell commented 5 years ago

I tried master branch script.But error happened. No glanceslib. Setting up glanceslib and all other leela-zero packages. root@instance-fstbrnc:~# exit logout Hit:1 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic InRelease Hit:2 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:3 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-backports InRelease
Hit:4 http://archive.canonical.com/ubuntu bionic InRelease
Hit:5 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease
Hit:6 http://security.ubuntu.com/ubuntu bionic-security InRelease Reading package lists... Done
Hit:1 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic InRelease Hit:2 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-updates InRelease
Hit:3 http://asia-east1.gce.archive.ubuntu.com/ubuntu bionic-backports InRelease
Hit:4 http://archive.canonical.com/ubuntu bionic InRelease
Hit:5 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:6 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu bionic InRelease Reading package lists... Done
E: Could not get lock /var/lib/dpkg/lock-frontend - open (11: Resource temporarily unavailable) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?

I just copied&pasted the script...

leela-zero / leela-zero

Performance of 0.17 on V100 (Google Compute): 2000 n/s vs 6000 n/s #2335