Closed Preisschild closed 10 months ago
I don't know why it logs cannot perform operation because volume is locked (locked). I did not manually lock the disk in the Hetzner Console.
When you make changes to Volumes (servers, ...) the resource is "locked" in the API, this lock is lifted after the action is finished.
This looks pretty weird to me, so I took a look at the API requests & Actions in our backend. I noticed 3 things, which in tandem might explain the symptoms:
We have elevated error rates on attaching volumes right now, might be related: https://status.hetzner.com/incident/e50d7b28-d0a7-4014-ba14-be70f46032ff
I saw 3 requests for detaching, arriving within tens of milliseconds. Are you by any chance running 3 replicas of the controller without enabling the leader election?
If you want to run with multiple replicas, please make sure to enable leader election on all sidecar containers https://github.com/hetznercloud/csi-driver/blob/604cbb3fed1b4b045c76d1b99f6ef4ec2c2eab01/chart/templates/controller/deployment.yaml#L83-L86
Its not indicated in your logs, but I found a single attach
action that was marked as failure. Possibly related to the previous two problems, as one of the other replicas might have processed that attach.
Ah that explains it. It was running with multiple replicas, but without leader election enabled. Thank you.
Closing this issue
That at least explains the missing log messages and locked volumes. Should be better with a single replica doing the work, then the events & logs should be cleaner to understand.
TL;DR
We sometimes have Pods with PVCs stuck in the
ContainerCreating
stage forever because mounting the filesystem doesn't work because the Hetzner Volume isn't attached to the instance in the first placeExpected behavior
Volumes should always be attached automatically
Observed behavior
VolumeAttachment is existent and kubelet tries to mount the device with luks, but can't find the device since it isn't attached to the Instance
Minimal working example
I'm not sure how to reproduce this yet, I think it may be a race issue when a pod is being evicted from a node and then scheduled onto a new node
Log output
hcloud-csi-controller:
Additional information
I don't know why it logs
cannot perform operation because volume is locked (locked)
. I did not manually lock the disk in the Hetzner Console.No response