k8snetworkplumbingwg / sriov-network-operator

Operator for provisioning and configuring SR-IOV CNI plugin and device plugin
Apache License 2.0
76 stars 106 forks source link

[StatefulSet/Parallel] Virtual Functions Issues When Creating/Destroying Pods in Parallel. Switching devices #668

Open midnattsol opened 4 months ago

midnattsol commented 4 months ago

Environment

Problem Description When I create the statefulset in parallel, or when it terminates (all the pods terminates at the same time), randomly some interfaces switch the PCI where it points.

image

So when I check the the device associated in the host they are totally messed

# cat /sys/class/infiniband_verbs/uverbs{19,20,21,22,23,24}/ibdev
mlx5_24
mlx5_23
mlx5_21
mlx5_22
mlx5_20
mlx5_19

So the pods cannot recognize the mlx interfaces to use them with UCX.

Workaround so far Creating the cluster sequentially and scaling to 0 before terminating the statefulset helps, because the race condition is not triggered, but I guess is not the expected behaviour.

midnattsol commented 4 months ago

I've seen this PR that modifies the behaviour for the switchdev https://github.com/k8snetworkplumbingwg/sriov-network-operator/pull/643

Could this potentially help with the problem?

SchSeba commented 3 months ago

so there is a global lock in the ib-sriov-cni that should prevent this one

@e0ne @ykulazhenkov is this something you will be able to take a look?

adrianchiris commented 2 months ago

Hi,

can you provide the SriovIbNetwork you defined as well as the SriovPolicy ?

is the problem only that RDMA device changes (i.e mlx5_19 gets recreated/renamed to mlx5_24 after pod was deleted) ? when the new pod starts does it have the correct mounts ? and UCX is unable to cope with RDMA device mlx5_24 having ULPs with different index (e.g uverbs19) ?