Open mfranczy opened 1 year ago
I think the easiest fix for that would be to not return an error when VM is not found. This would allow CSI controller to remove the volume attachment.
I could work on it if you approve the solution.
So if I understand this correctly the CSI driver cannot find the VMI, because it no longer exists due to the drain on the infra node, and a new VMI exists on a different node, due to the live migration? And because the now original VMI no longer exists the CSI driver is throwing that error because it is trying to unpublish from a node that is not longer there.
Instead of ignoring that, maybe we should improve the mechanism that finds the right VMI so it would find the VMI on the new infra node?
because it no longer exists due to the drain on the infra node, and a new VMI exists on a different node, due to the live migration?
The VMI is not migrated, it is simply re-created. We have eviction strategy set to External
.
And because the now original VMI no longer exists the CSI driver is throwing that error because it is trying to unpublish from a node that is not longer there.
Exactly.
Instead of ignoring that, maybe we should improve the mechanism that finds the right VMI so it would find the VMI on the new infra node?
Since we don't live migrate the VMI but recreate.. I don't think that finding it again would help. However, maybe I miss something. My reasoning is that a new VMI will not have the volume attached anyway. Executing unpublish on the VMI that doesn't have the volume would probably return an error again (that's my guess, I would have to take a closer look to the implementation).
Okay, so if you recreate the VMI then simply ignoring the error would be correct, but if someone live migrates the node, then I am not sure if simply ignoring is the correct course of action. The driver should do the right thing in both scenarios IMO.
Understood. So when working on the fix I will consider the eviction strategy. Thanks.
I just checked the getVMNameByCSINodeID
function and it looks up the VMI by vmi.Spec.Domain.Firmware.UUID
not by name or anything like that. So a live migration this should not error at all. And if you re-creating the VMI should it not also have the same Firmware.UUID? I am not 100% sure if that statement is true or not.
Just digging a little deeper, the nodeID is the node.Status.NodeInfo.SystemUUID
maybe we should look for node.status.volumesAttached
instead? and if not found, should be like okay, we already detached the volume.
maybe we should look for node.status.volumesAttached instead? and if not found, should be like okay, we already detached the volume.
Sounds promising, I will dig into it and see if there are problems with it.
And if you re-creating the VMI should it not also have the same Firmware.UUID
That would block the workload eviction cause to make sure there is no volume attached we would have to wait for a new VMI node to join the cluster.
I like your idea with node.status.volumesAttached
.
The name of the attached volume in the status is part csi driver name and part volume name, so I would match on both csi driver and volume name.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
/remove-lifecycle stale
/remove-lifecycle rotten
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
/lifecycle rotten
Looking forward to it being fixed too...
/remove-lifecycle rotten
/lifecycle frozen
my case is k8s master has been recreated, after k8s master get ready, ADcontroller check the volume is attached already, node.status.Volumeattached still have that test pvc, then volume-manager calls NodeStageVolume and failed mounted. keep retrying... and then pod becomes unknown, it seems like the same problem as issue 83
It seems that this situation cannot be judged using node.status.VolumeAttached
Just digging a little deeper, the nodeID is the
node.Status.NodeInfo.SystemUUID
maybe we should look fornode.status.volumesAttached
instead? and if not found, should be like okay, we already detached the volume.
Is this a BUG REPORT or FEATURE REQUEST?: /kind bug
What happened: While draining KubeVirt infrastructure nodes (bare-metal nodes) we evict virtual machines workload to another vm nodes. Sometimes it happens that during eviction process the recreated pod on different vms gets error (not always):
Further investigation showed that this is because a
volumeattachment
resource is not being deleted due to error coming from the KubeVirt CSI driver:There is a race condition between VM and volume attachment deletion.
What you expected to happen: Volume attachment resource for non existing vms deleted.
How to reproduce it (as minimally and precisely as possible):
That's the easiest way (with deployment it's harder to spot the bug as it depends on reconciliation timing).
Anything else we need to know?: I think this is a problematic part of the code: https://github.com/kubevirt/csi-driver/blob/main/pkg/service/controller.go#L320-L323
Environment:
cc71b72b8d5a205685985244c61707c5e40c9d5f
kubectl version
):Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-05-19T19:39:28Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}