Open zchunhai opened 6 years ago
Yeah, we have such plan. And your contributions are welcome.
@zchunhai have you tried RDMA HCA mode? It's supported by arena so far. For SRIOV, are you looking for bandwidth isolation, or can you describe your scenario more? thanks
In HCA mode, a veth is created separately from the IB device (using Calico or Flannel), and the IF binds to the netdev, e.g. eth0, is bridged to the host network and has no RDMA capabilities. This breaks applications that run on bare metal RoCE and use RDMA Connection Manager (rdmacm) API to resolve IP to the corresponding IB device.
@wsxiaozhang We have tried HCA mode with RoCE adapter (thus we do not configure IPoIB) and we found the QP in container cannot establish RDMA connection:
[root@iperf-client-1 tmp]# ib_write_bw -d mlx5_0 &
[1] 208
[root@iperf-client-1 tmp]#
************************************
* Waiting for client to connect... *
************************************
[root@iperf-client-1 tmp]# ib_write_bw -d mlx5_0 localhost &
[2] 209
[root@iperf-client-1 tmp]#
[root@iperf-client-1 tmp]# ---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 1024[B]
Link type : Ethernet
GID index : 0
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x011b PSN 0x46059f RKey 0x0036a2 VAddr 0x007f1542c3c000
GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : mlx5_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
CQ Moderation : 100
Mtu : 1024[B]
Link type : Ethernet
GID index : 0
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0000 QPN 0x011c PSN 0x88e426 RKey 0x00bab8 VAddr 0x007f33a21fb000
GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
remote address: LID 0000 QPN 0x011b PSN 0x46059f RKey 0x0036a2 VAddr 0x007f1542c3c000
GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
Failed to modify QP 284 to RTR
Unable to Connect the HCA's through the link
remote address: LID 0000 QPN 0x011c PSN 0x88e426 RKey 0x00bab8 VAddr 0x007f33a21fb000
GID: 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
Failed to modify QP 283 to RTR
Unable to Connect the HCA's through the link
Is this configuration supported?
Also raised an issue upstream: https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin/issues/18
In tfjob, is there a plan to support RDMA SRIOV with non hostNetwork? Using https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin.