spidernet-io / spiderpool

Underlay and RDMA network solution of the Kubernetes, for bare metal, VM and any public cloud
https://spidernet-io.github.io/spiderpool/
Apache License 2.0
505 stars 72 forks source link

Multi-NIC: cross-track-access requires policy-based routing #3547

Closed riverzhang closed 1 week ago

riverzhang commented 1 month ago

What would you like to be added?

1111

Spiderpool supports the same track routing configuration

1.add route of pod1 ip route add 192.168.101.0/25 via 192.168.101.126 dev eth2 ip route add 192.168.103.0/25 via 192.168.103.126 dev eth3

2.add route of pod2 ip route add 192.168.102.0/25 via 192.168.102.126 dev eth2 ip route add 192.168.104.0/25 via 192.168.104.126 dev eth3

3.nccl test network nccl test env GPUDirect RDMA bandwidth (Maximum bandwidth 25GB/s) nccl test version
RoCE NCCl_CROSS_NIC=1 not support 2.13.8
RoCE NCCl_CROSS_NIC=0 18.75GB/s 2.13.8

The container manually configures the cross-track routing policy

cat /etc/iproute2/rt_tables 101 track_first 102 track_second

1.add default route table of pod1 ip route add default via 192.168.101.126 dev eth2 table track_first ip route add default via 192.168.103.126 dev eth3 table track_second

2.add cross-track route of pod1 ip route add 192.168.102.0/25 via 192.168.101.126 dev eth2 ip route add 192.168.104.0/25 via 192.168.103.126 dev eth3

3.add rule of pod1 ip rule add from 192.168.101.0/25 table track_first ip rule add from 192.168.102.0/25 table track_first ip rule add from 192.168.103.0/25 table track_second ip rule add from 192.168.104.0/25 table track_second

4.add default route table of pod2 ip route add default via 192.168.102.126 dev eth2 table track_first ip route add default via 192.168.104.126 dev eth3 table track_second

5.add cross-track route of pod2 ip route add 192.168.101.0/25 via 192.168.102.126 dev eth2 ip route add 192.168.103.0/25 via 192.168.104.126 dev eth3

6.add rule of pod2 ip rule add from 192.168.101.0/25 table track_first ip rule add from 192.168.102.0/25 table track_first ip rule add from 192.168.103.0/25 table track_second ip rule add from 192.168.104.0/25 table track_second

4.nccl test

network nccl test env GPUDirect RDMA bandwidth (Maximum bandwidth 25GB/s) nccl test version
RoCE NCCl_CROSS_NIC=1 20.59GB/s 2.13.8
RoCE NCCl_CROSS_NIC=0 18.75GB/s 2.13.8

Why is this needed?

Therefore, cross-track performance is better than non-cross-track, And the default setting for the application is to enable cross-track.

How to implement it (if possible)?

No response

Additional context

No response