twitter-archive / torch-ipc

A set of primitives for parallel computation in Torch
Apache License 2.0
95 stars 28 forks source link

What is the correct setting of GPU cards? #17

Closed chienlinhuang1116 closed 8 years ago

chienlinhuang1116 commented 8 years ago

Hi, I have a machine with 8 GPU cards. The information of their topo matrix is shown as follows. Is this a correct setting to test Torch-IPC, or Do you have any comment?

Thanks, Chien-Lin

-bash-4.1$ nvidia-smi topo --matrix
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    CPU Affinity
GPU0     X      PIX     SOC     SOC     SOC     SOC     SOC     SOC     0-7,16-23
GPU1    PIX      X      SOC     SOC     SOC     SOC     SOC     SOC     0-7,16-23
GPU2    SOC     SOC      X      PIX     PHB     PHB     PHB     PHB     8-15,24-31
GPU3    SOC     SOC     PIX      X      PHB     PHB     PHB     PHB     8-15,24-31
GPU4    SOC     SOC     PHB     PHB      X      PIX     PXB     PXB     8-15,24-31
GPU5    SOC     SOC     PHB     PHB     PIX      X      PXB     PXB     8-15,24-31
GPU6    SOC     SOC     PHB     PHB     PXB     PXB      X      PIX     8-15,24-31
GPU7    SOC     SOC     PHB     PHB     PXB     PXB     PIX      X      8-15,24-31

Legend:

  X   = Self
  SOC = Path traverses a socket-level link (e.g. QPI)
  PHB = Path traverses a PCIe host bridge
  PXB = Path traverses multiple PCIe internal switches
  PIX = Path traverses a PCIe internal switch

In addition, the following results are shown in the case of 2GPUs:

-bash-4.1$ ./mnist-cuda45-PIX.sh
INFO: torch-ipc: CUDA IPC enabled between GPU1 and GPU0
INFO: torch-ipc: CUDA IPC enabled between GPU0 and GPU1
**Output: incorrect results (long time & low accuracy)**

-bash-4.1$ ./mnist-cuda46-PXB.sh
INFO: torch-ipc: CUDA IPC enabled between GPU1 and GPU0
INFO: torch-ipc: CUDA IPC enabled between GPU0 and GPU1
**Output: incorrect results (long time & low accuracy)**

-bash-4.1$ ./mnist-cuda43-PHB.sh
INFO: torch-ipc: CUDA IPC enabled between GPU1 and GPU0
INFO: torch-ipc: CUDA IPC enabled between GPU0 and GPU1
**Output: 1-epoch, global correct: 97.9%, total time: 108.4 sec**

-bash-4.1$ ./mnist-cuda43.sh
INFO: torch-ipc: CUDA IPC not possible between GPU1 and GPU0
INFO: torch-ipc: CUDA IPC not possible between GPU0 and GPU1
**Output: 1-epoch, global correct: 98.1%, total time: 113.5 sec**
zakattacktwitter commented 8 years ago

Hi,

Sorry you are having so much trouble. Can you run the CUDA SDK sample for simpleP2P? I have a feeling there is something wrong with your machine's setup.

There's a security feature called ACS that needs to be disabled in order for the GPUs to be able to talk to each other correctly.

First find out your PCI switch addresses:

lspci | grep -i plx

You should see something like this as output:

02:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
03:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
03:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
06:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
07:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
07:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)

Then look to see if ACS is enabled on each one:

lspci -s 02:00.0 -vvvv|grep -i acs

If you see results like the following, then ACS is enabled, you need to turn it off.

UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-

See this post from SuperMicro for more information http://www.supermicro.com/support/faqs/faq.cfm?faq=20732

Hope this helps!

chienlinhuang1116 commented 8 years ago

Hi Zak.

Here is the output:

-bash-4.1$ lspci | grep -i plx
02:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
03:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
03:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
81:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
82:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
82:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
85:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
86:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
86:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
87:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
88:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
88:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
8b:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
8c:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
8c:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)

Then, I check if ACS is enabled on each one. There is nothing output. Do you have any idea?

-bash-4.1$ lspci -s 02:00.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 03:08.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 03:10.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 81:00.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 82:08.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 82:10.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 85:00.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 86:08.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 86:10.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 87:00.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 88:08.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 88:10.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 8b:00.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 8c:08.0 -vvvv|grep -i acs
-bash-4.1$ lspci -s 8c:10.0 -vvvv|grep -i acs

Thank you very much & sorry for the hassle.

Chien-Lin

zakattacktwitter commented 8 years ago

Did you try the simpleP2P example that comes with the CUDA SDK?

On Tue, Mar 8, 2016 at 8:55 AM, Chien-Lin Huang 黃建霖 < notifications@github.com> wrote:

Hi Zak.

Here is the output:

-bash-4.1$ lspci | grep -i plx 02:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 03:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 03:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 81:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 82:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 82:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 85:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 86:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 86:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 87:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 88:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 88:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 8b:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 8c:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca) 8c:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)

Then, I check if ACS is enabled on each one:

-bash-4.1$ lspci -s 02:00.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 03:08.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 03:10.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 81:00.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 82:08.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 82:10.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 85:00.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 86:08.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 86:10.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 87:00.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 88:08.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 88:10.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 8b:00.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 8c:08.0 -vvvv|grep -i acs -bash-4.1$ lspci -s 8c:10.0 -vvvv|grep -i acs

Do you have any idea?

Thank you very much & sorry for the hassle.

Chien-Lin

— Reply to this email directly or view it on GitHub https://github.com/twitter/torch-ipc/issues/17#issuecomment-193865617.

chienlinhuang1116 commented 8 years ago

Thank you Zak, you are right and the problem can be solved by disabling ACS.

I can get the same output like yours after using sudo. There is a discussion of "Multi-GPU Peer to Peer access failing on Tesla K80" https://devtalk.nvidia.com/default/topic/883054/cuda-programming-and-performance/multi-gpu-peer-to-peer-access-failing-on-tesla-k80. I try the simpleP2P example and disable ACSCtl shown as follows.

ACSCtl is enabled

-bash-4.1$ ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 8
> GPU0 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU1 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU2 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU3 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU4 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU5 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU6 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU7 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU2) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU3) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU4) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU5) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU6) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU7) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU2) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU3) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU4) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU5) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU6) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU7) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU6) : Yes
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> Tesla K80 (GPU0) supports UVA: Yes
> Tesla K80 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 1.11GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Verification error @ element 0: val = nan, ref = 0.000000
Verification error @ element 1: val = nan, ref = 4.000000
Verification error @ element 2: val = nan, ref = 8.000000
Verification error @ element 3: val = nan, ref = 12.000000
Verification error @ element 4: val = nan, ref = 16.000000
Verification error @ element 5: val = nan, ref = 20.000000
Verification error @ element 6: val = nan, ref = 24.000000
Verification error @ element 7: val = nan, ref = 28.000000
Verification error @ element 8: val = nan, ref = 32.000000
Verification error @ element 9: val = nan, ref = 36.000000
Verification error @ element 10: val = nan, ref = 40.000000
Verification error @ element 11: val = nan, ref = 44.000000
Disabling peer access...
Shutting down...
Test failed!

ACSCtl is disabled

-bash-4.1$ ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 8
> GPU0 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU1 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU2 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU3 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU4 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU5 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU6 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)
> GPU7 = "      Tesla K80" IS  capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU2) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU3) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU4) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU5) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU6) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU7) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU2) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU3) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU4) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU5) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU6) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU7) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU6) : Yes
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> Tesla K80 (GPU0) supports UVA: Yes
> Tesla K80 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 7.42GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Disabling peer access...
Shutting down...
Test passed