Open bobbypage opened 1 year ago
/sig storage /sig node
/cc @msau42
/kind bug
/triage accepted /priority important-longterm
doesn't look like a regression
/cc @xmcqueen
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
/triage accepted
(org members only)/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove lifecycle-rotten
Another one for https://github.com/kubernetes/test-infra/issues/32957
This issue looks like it may be tricky to solve but remains a source of flakes and should stay tracked.
What happened?
Kubelet has an internal
syncTerminatedPod
which is called after pods are terminated. The function is responsible for some final pod cleanup and is responsible to ensure that volumes mounted to the pod are unmounted. The function callsvolumeManager.WaitForUnmount
:https://github.com/kubernetes/kubernetes/blob/7d9c0e0a78e519cac0f892a8be2f063bedc94bad/pkg/kubelet/kubelet.go#L1881-L1885
As part of doing some testing for a different issue, I came across an issue with emptydir handling of unmounting -- it looks like that
volumeManager.WaitForUnmount
will return true if even if the empty dir was not unmounted successfully.Chatted with @msau42 about this issue and it seems this is because, during
WaitForUnmount
, it is checking for mounted state: https://github.com/kubernetes/kubernetes/blob/7d9c0e0a78e519cac0f892a8be2f063bedc94bad/pkg/kubelet/volumemanager/cache/actual_state_of_world.go#L965However, if there is an error during unmounting the volume is marked as "uncertain" (https://github.com/kubernetes/kubernetes/blob/7d9c0e0a78e519cac0f892a8be2f063bedc94bad/pkg/volume/util/operationexecutor/operation_generator.go#L879), which results in
WaitForUnmount
succeeding despite an error during unmounting. It's unclear if this is expected behavior.What did you expect to happen?
I expected that
volumeManager.WaitForUnmount
will block (or return error) if the emptydir had an error unmounting.How can we reproduce it (as minimally and precisely as possible)?
Create the following pod:
Get the pod uid
Enter the kind-worker (node) and do a
chattr +i
on the emtpydir. This will make the emptydir volume immutable and prevent it from being unmounted (and deleted).Now delete, the pod:
Here's the logs:
Kubelet logs - https://gist.github.com/4f9b1fa0edda8d260c90a1f18c9dc6e5 Kubelet logs for test-pd pod: https://gist.github.com/d63b8713b71bef1e712ca138fcb5d602
The notable logs:
Termination (
syncTerminatingPod
):But
syncTerminatedPod
succeeded (despite the volume not actually unmounting) (This is becauseWaitForUnmount
succeeded incorrectly)Volume continues to try to be unmounted later
Anything else we need to know?
No response
Kubernetes version
1.25.2
Cloud provider
n/a
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)