[StatefulSet/Parallel] Virtual Functions Issues When Creating/Destroying Pods in Parallel. Switching devices

midnattsol commented 4 months ago

Environment

I'm trying to create a statefulset assigning 2 VF associated with 2 PF Infiniband interfaces per pod.
I'm creating 8 pods per server.
Every server has 16 infiniband interfaces, and I create 1VF per interface.

Problem Description When I create the statefulset in parallel, or when it terminates (all the pods terminates at the same time), randomly some interfaces switch the PCI where it points.

So when I check the the device associated in the host they are totally messed

# cat /sys/class/infiniband_verbs/uverbs{19,20,21,22,23,24}/ibdev
mlx5_24
mlx5_23
mlx5_21
mlx5_22
mlx5_20
mlx5_19

The uverbs19 is pointing to the interface 24
The uverbs20 is pointing to the interface 23
The uverbs23 is pointing to the interface 20
The uverbs24 is pointing to the interface 19

So the pods cannot recognize the mlx interfaces to use them with UCX.

Workaround so far Creating the cluster sequentially and scaling to 0 before terminating the statefulset helps, because the race condition is not triggered, but I guess is not the expected behaviour.

midnattsol commented 4 months ago

I've seen this PR that modifies the behaviour for the switchdev https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/643

Could this potentially help with the problem?

SchSeba commented 3 months ago

so there is a global lock in the ib-sriov-cni that should prevent this one

@e0ne @ykulazhenkov is this something you will be able to take a look?

adrianchiris commented 2 months ago

Hi,

can you provide the SriovIbNetwork you defined as well as the SriovPolicy ?

is the problem only that RDMA device changes (i.e mlx5_19 gets recreated/renamed to mlx5_24 after pod was deleted) ? when the new pod starts does it have the correct mounts ? and UCX is unable to cope with RDMA device mlx5_24 having ULPs with different index (e.g uverbs19) ?

k8snetworkplumbingwg / sriov-network-operator

[StatefulSet/Parallel] Virtual Functions Issues When Creating/Destroying Pods in Parallel. Switching devices #668