kubeflow / arena

A CLI for Kubeflow.
Apache License 2.0
719 stars 176 forks source link

In tfjob, is there a plan to support RDMA SRIOV with non hostNetwork? #67

Open zchunhai opened 5 years ago

zchunhai commented 5 years ago

In tfjob, is there a plan to support RDMA SRIOV with non hostNetwork? Using https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin.

cheyang commented 5 years ago

Yeah, we have such plan. And your contributions are welcome.

wsxiaozhang commented 5 years ago

@zchunhai have you tried RDMA HCA mode? It's supported by arena so far. For SRIOV, are you looking for bandwidth isolation, or can you describe your scenario more? thanks

byronyi commented 5 years ago

In HCA mode, a veth is created separately from the IB device (using Calico or Flannel), and the IF binds to the netdev, e.g. eth0, is bridged to the host network and has no RDMA capabilities. This breaks applications that run on bare metal RoCE and use RDMA Connection Manager (rdmacm) API to resolve IP to the corresponding IB device.

asdfsx commented 5 years ago

@wsxiaozhang We have tried HCA mode with RoCE adapter (thus we do not configure IPoIB) and we found the QP in container cannot establish RDMA connection:

[root@iperf-client-1 tmp]# ib_write_bw -d mlx5_0 &
[1] 208
[root@iperf-client-1 tmp]# 
************************************
* Waiting for client to connect... *
************************************

[root@iperf-client-1 tmp]# ib_write_bw -d mlx5_0 localhost &
[2] 209
[root@iperf-client-1 tmp]# 
[root@iperf-client-1 tmp]# ---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF      Device         : mlx5_0
 Number of qps   : 1        Transport type : IB
 Connection type : RC       Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 0
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x011b PSN 0x46059f RKey 0x0036a2 VAddr 0x007f1542c3c000
 GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF      Device         : mlx5_0
 Number of qps   : 1        Transport type : IB
 Connection type : RC       Using SRQ      : OFF
 CQ Moderation   : 100
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 0
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0x011c PSN 0x88e426 RKey 0x00bab8 VAddr 0x007f33a21fb000
 GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
 remote address: LID 0000 QPN 0x011b PSN 0x46059f RKey 0x0036a2 VAddr 0x007f1542c3c000
 GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
Failed to modify QP 284 to RTR
 Unable to Connect the HCA's through the link
 remote address: LID 0000 QPN 0x011c PSN 0x88e426 RKey 0x00bab8 VAddr 0x007f33a21fb000
 GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
Failed to modify QP 283 to RTR
 Unable to Connect the HCA's through the link

Is this configuration supported?

asdfsx commented 5 years ago

Also raised an issue upstream: https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin/issues/18