Open msau42 opened 1 year ago
cc @jsafrane @gnufied
IIRC, the external-attacher always calls ControllerUnpublish
when a VolumeAttachment gets DeletionTimestamp, regardless if the previous attach was successful or not or if it got a final error or temporary one. In this sense, an attachment is always "uncertain" until fully attached.
It could be possible to optimize and mark attachments as "not attached" after a final ControllerPublish
error and skip ControllerUnpublish
in this case. I am not sure it's worth the effort.
I agree it sounds like it should not cause problems, but one could imagine that some plugin depends on the current Kubernetes behavior to avoid leaking resources.
My reading of the spec on the requirements for the CO are ambiguous in this area. There is a lot of "the CO may choose to" language that suggests the CO has little if any obligation here.
The main challenge is that the current behavior relies on the plugin to potentially keep state about pending attaches requests, which may be difficult. You can have a sequence like:
Some ways that a plugin could address this are:
However, keeping track of pending operations may be difficult. For example, in GCP, operation metadata only provides the instance id, not the disk id. So the CSI driver would have to potentially 1) serialize operations per instance, which would not be good for performance or 2) cache operations in memory and only serialize on restart 3) keep operations state in some CRD
For reference, here is a prototype of adding operation caching in the GCP driver: https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/pull/923
Discussed at the triage meeting. We'll work on a design that is similar to how provisioner works and the attach/detach operations will be synchronized with an operations cache.
/priority important-soon
/triage accepted
I will work on this issue, unable to assign to myself.
This issue is labeled with priority/important-soon
but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.
You can:
/triage accepted
(org members only)/priority important-longterm
or /priority backlog
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/priority important-longterm /triage accepted
This issue is labeled with priority/important-soon
but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.
You can:
/triage accepted
(org members only)/priority important-longterm
or /priority backlog
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/priority important-longterm /triage accepted
This issue is labeled with priority/important-soon
but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.
You can:
/triage accepted
(org members only)/priority important-longterm
or /priority backlog
/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/remove-priority important-soon
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
It looks like we check for finalError on
ControllerPublishVolume
: https://github.com/kubernetes-csi/external-attacher/blob/c4cfca3cd22d7437f0a1d712e3fac30211d6ec09/pkg/attacher/attacher.go#L70But we ignore it later on: https://github.com/kubernetes-csi/external-attacher/blob/c4cfca3cd22d7437f0a1d712e3fac30211d6ec09/pkg/controller/csi_handler.go#L513
And we rely on the driver's implementation of
ControllerUnpublishVolume
to be able to properly detect if there is still an attach operation in progress.This is different than how we handle other uncertain states for volume operations like provision and mount. Should we consider adding uncertain handling for attach?