k8snetworkplumbingwg / sriov-cni

DPDK & SR-IOV CNI plugin
Apache License 2.0
298 stars 147 forks source link

Move allocated pci deletion in defer of del cmd #278

Open mlguerrero12 opened 1 year ago

mlguerrero12 commented 1 year ago

The allocated pci is supposed to be deleted when the del command is completed. However, placing this logic at the end of the function does not always guaranteed its execution. For instance, if ResetVfConfig or ReleaseVF fail, the err variable is overwritten which causes the cache to be deleted in the defer function and the next del retry to exit without deleting the allocated pci. By placing this logic along the deletion of the cache will ensure that all operations in the del command are done.

mlguerrero12 commented 12 months ago

First of all, it is not correct to rely on protections in the add command. We need to delete the allocation when the del command finishes for all cases. That's with the aim of having consistency. It is not correct to delete for some cases and for others not.

Also, note that the error when the delete allocation fails is useless, the err var is not overwritten which will lead a retry that will surely fail (most likely in ReleaseVf). The cache will be deleted and the next retry will exit again without deleting the allocation.

Second, by introducing this synchronization logic (not to run add before del), we are seeing new issues when for some reason the namespace is not deleted but all networking inside is removed. Before, the plugin simply continue but now it stays stuck in the add command, even though, it is safe to continue.

This won't solve the above, but it will help reduce it by having the previous behavior when the del command correctly finishes for all cases.

SchSeba commented 12 months ago

Hi @mlguerrero12 thanks for the comment!

the thing is that if we failed on moving back the vf because of an issue (and we already encounter some of them) it will not be right to delete the lock because again the next pod that wants to use this will not work.

I agree we can improve when to remove the allocation file if the VF is not in the namespace anymore but for some bug the network namespace is still preset we can remove the lock.

WDYT?

mlguerrero12 commented 11 months ago

@SchSeba, I think we are mixing up the issues here. I agree that we need to find a way to inspect a namespace that failed to be deleted due to a bug to see if the networking has been removed.

The protection that is currently present in the add command is supposed to be when the del command fails to complete. When the del command correctly terminates, the lock is supposed to be deleted. However, this is not always the case since the deletion of the lock is done at the end of the function. For instance, the del command terminates in line 205, 226 and 254 and the lock is not deleted. Do you have a valid reason for this? Also, if the lock deletion fails, the next retry will fail in ReleaseVF for no dpdk drivers. So, it's useless.

What I want to achieve in this PR is to consistently delete the lock for all cases that the del command terminates and not rely on the protection in the add command which is supposed to be only when the del command is skipped.

SchSeba commented 11 months ago

I get your point. but for this to work I think we need also another step we need to change the places where we are creating a new err variable like

if _, err := utils.GetVfid(netConf.DeviceID, netConf.Master); err != nil {
        return fmt.Errorf("cmdDel() error obtaining VF ID: %q", err)
    }

...
if err := sm.ResetVFConfig(netConf); err != nil {
        return fmt.Errorf("cmdDel() error reseting VF: %q", err)
    }
...

this way we will really clean this up only if there was not error in the Delete function no?

mlguerrero12 commented 11 months ago

Based on the comments around the pci allocation code. The main purpose is to have a synchronization mechanism to not execute an add command before a del command.

In config.go, line 41.

" // This is to prevent issues where kubelet request to delete a pod and in the same time a new pod using the same // vf is started. we can have an issue where the cmdDel of the old pod is called AFTER the cmdAdd of the new one // This will block the new pod creation until the cmdDel is done."

Line 71,

" // To prevent a locking of a PCI address for every pciAddress file we also add the netns path where it's been used // This way if for some reason the cmdDel command was not called but the pod namespace doesn't exist anymore // we release the PCI address"

Now, it seems you want to use this as well to detect errors in the del command and force a check in the add command. This is not mentioned anywhere. I thought the protection was only when the del command is not executed.

SchSeba commented 4 months ago

Hi @mlguerrero12 thanks for the comment!

you are right maybe we can add a better comment for this. another use-case is the protection if for some reason the cniDel failed.

also we saw use-cases where crio is not able to remove the network namespace and the VF remains on that namespace instead of moving back into the host network.

in the case without this block, we will get again the generic error in the new pod creation saying enable to find vf interface