kubernetes-sigs / gcp-compute-persistent-disk-csi-driver

The Google Compute Engine Persistent Disk (GCE PD) Container Storage Interface (CSI) Storage Plugin.
Apache License 2.0
163 stars 143 forks source link

Race when running multiple controller replicas #516

Closed ffilippopoulos closed 4 years ago

ffilippopoulos commented 4 years ago

We are running the following configuration: kube v1.17.3 os-image Flatcar Container Linux by Kinvolk 2512.2.0 (Oklo) kernel 4.19.124-flatcar docker docker://18.6.3 and deploy csi driver from github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/deploy/kubernetes/overlays/stable?ref=v0.7.0

We decided to use 2 replicas for the controller, since we saw that this is a statefulset and assumed it will be supported (we couldn't find any relevant doc if that is allowed)

csi-gce-pd-controller-0                                     4/4     Running   5          45h
csi-gce-pd-controller-1                                     4/4     Running   5          45h

That setup sporadically results in containers stuck at ContainerCreating state as the kube sees a Multi-Attach error:

53s         Warning   FailedAttachVolume             pod/prometheus-1                          Multi-Attach error for volume "pvc-72f6c84c-6e0b-4e33-90e8-8ad22bdb653b" Volume is already exclusively attach
ed to one node and can't be attached to another                                                                                                                                                             
29s         Warning   FailedAttachVolume             pod/prometheus-1                          AttachVolume.Attach failed for volume "pvc-72f6c84c-6e0b-4e33-90e8-8ad22bdb653b" : rpc error: code = Internal
 desc = unknown Attach error: failed when waiting for zonal op: operation operation-1590671984460-5a6b52e968410-035a9d7d-a615df1e failed (RESOURCE_IN_USE_BY_ANOTHER_RESOURCE): The disk resource 'projects/
uw-dev/zones/europe-west2-c/disks/pvc-72f6c84c-6e0b-4e33-90e8-8ad22bdb653b' is already being used by 'projects/uw-dev/zones/europe-west2-c/instances/worker-k8s-pbdr'

We have noticed that these logs come from one of the two controllers (the one that issues the attach command second):

csi-gce-pd-controller-1 csi-attacher I0528 13:17:12.188224       1 csi_handler.go:99] Error processing "csi-8fb2cee8e7ba0ead8206f1ba5a8f66d3d6b8273fc110e9be399eec76355051ca": failed to attach: rpc error: 
code = Internal desc = unknown Attach error: failed when waiting for zonal op: operation operation-1590671819790-5a6b524c5d9a5-4e4fb545-80ff59c8 failed (RESOURCE_IN_USE_BY_ANOTHER_RESOURCE): The disk reso
urce 'projects/uw-dev/zones/europe-west2-c/disks/pvc-72f6c84c-6e0b-4e33-90e8-8ad22bdb653b' is already being used by 'projects/uw-dev/zones/europe-west2-c/instances/worker-k8s-hw1v'

I am not sure whether this is a bug on controller running with multiple replicas, or just a question if csi-gce-pd-controller is designed to always run as a single pod.

Could someone help us with that, or point at the correct documentation to see what we are missing?

msau42 commented 4 years ago

More than one controller running concurrently is not supported, unless you enable leader election so that only one controller will be active at a time.

ffilippopoulos commented 4 years ago

@msau42 thank you very much for clearing this. Can you point me on how to enable leader election? Other than that I reckon this issue can be closed

msau42 commented 4 years ago

Each of the csi sidecars (provisioner, attacher, resizer, snapshotter) has a --leader-election and --leader-election-namespace flag that should be passed to it. Note that in some older sidecar versions, there were multiple leader election methods, so for those versions --leader-election-type=leases (requires k8s 1.14) should be used in addition.

@verult would you be able to update our specs to enable that?

verult commented 4 years ago

Sure thing. Note that it might be better to use a Deployment instead of a StatefulSet for multiple replicas: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md#cluster-level-deployment

/assign