Kublet using PIDs for scope names can result in collisions and containers crash-looping.

vespian commented 4 years ago

What happened: One of our deployments failed due to the fact that there was PID collision while mounting a config map. Kubelet is using systemd-run command to mount config maps, but unfortunately, the way systemd-run chooses the scope names is pretty simple - it just uses PIDs by default to differentiate scope names which can lead to collisions [2]. As a remedy, we could make kubelet start using "--unit=uuidgen" option as recommended in [2].

[1] https://github.com/kubernetes/kubernetes/blob/224be7bdce5a9dd0c2fd0d46b83865648e2fe0ba/pkg/util/mount/mount_linux.go#L97-L145 [2] https://lists.freedesktop.org/archives/systemd-devel/2015-October/034591.html

r 02 19:07:17 533389745-worker-pool0-0 systemd[1]: Started Kubernetes transient mount for /var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volume-subpaths/config/grafana/2.
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: E0302 19:07:17.738034     969 mount_linux.go:140] Mount failed: exit status 1
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: Mounting command: systemd-run
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volume-subpaths/sc-dashboard-provider/grafana/4 --scope -- mount -o bind /proc/969/fd/22 /var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volume-subpaths/sc-dashboard-provider/grafana/4
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: Output: Failed to start transient scope unit: Unit run-16873.scope already exists.
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: E0302 19:07:17.740632     969 subpath_linux.go:191] Failed to clean subpath "/var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volume-subpaths/sc-dashboard-provider/grafana/4": error cleaning subpath mount /var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volume-subpaths/sc-dashboard-provider/grafana/4: unmount failed: exit status 32
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: Unmounting arguments: /var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volume-subpaths/sc-dashboard-provider/grafana/4
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: Output: umount: /var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volume-subpaths/sc-dashboard-provider/grafana/4: not mounted
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: E0302 19:07:17.740655     969 kubelet_pods.go:226] failed to prepare subPath for volumeMount "sc-dashboard-provider" of container "grafana": error mounting /var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volumes/kubernetes.io~configmap/sc-dashboard-provider/..2020_03_02_19_06_29.000347228/provider.yaml: mount failed: exit status 1
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: Mounting command: systemd-run
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volume-subpaths/sc-dashboard-provider/grafana/4 --scope -- mount -o bind /proc/969/fd/22 /var/lib/kubelet/pods/f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59/volume-subpaths/sc-dashboard-provider/grafana/4
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: Output: Failed to start transient scope unit: Unit run-16873.scope already exists.
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: E0302 19:07:17.740690     969 kuberuntime_manager.go:783] container start failed: CreateContainerConfigError: failed to prepare subPath for volumeMount "sc-dashboard-provider" of container "grafana"
Mar 02 19:07:17 533389745-worker-pool0-0 kubelet[969]: E0302 19:07:17.740719     969 pod_workers.go:191] Error syncing pod f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59 ("kommander-kubeaddons-grafana-67795d8b79-794zv_kommander(f9c3dd5a-5e0d-4df8-b361-b98fdfa4fa59)"), skipping: failed to "StartContainer" for "grafana" with CreateContainerConfigError: "failed to prepare subPath for volumeMount \"sc-dashboard-provider\" of container \"grafana\""
Mar 02 19:07:17 533389745-worker-pool0-0 containerd[694]: time="2020-03-02T19:07:17.779144367Z" level=info msg="PullImage "grafana/grafana:6.6.0""
Mar 02 19:07:18 533389745-worker-pool0-0 containerd[694]: time="2020-03-02T19:07:18.394064710Z" level=info msg="ImageUpdate event &ImageUpdate{Name:docker.io/grafana/grafana:6.6.0,Labels:map[string]string{},}"

What you expected to happen: Deployment finishes successfully

How to reproduce it (as minimally and precisely as possible): This happened in our CI, probably given enough tries....

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): v1.16.4
Cloud provider or hardware configuration: AWS
OS (e.g: cat /etc/os-release): Ubuntu 16.04.10
Kernel (e.g. uname -a): 4.4.0-1095-aws
Install tools: kubeadm
Network plugin and version (if this is a network-related bug): -
Others: systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)

vespian commented 4 years ago

/sig node

tedyu commented 4 years ago

The referenced file seems to be relocated in master branch: ./vendor/k8s.io/utils/mount/mount_linux.go

Do you want to submit a PR ?

vespian commented 4 years ago

Do you want to submit a PR ?

Yes :)

vespian commented 4 years ago

More information on the matter:

the issue was fixed in the systemd >=228, commit https://github.com/systemd/systemd/commit/9c8d1e1a712
basing on the https://distrowatch.com webpage, some of the major distributions that still use systemds older than 228:
- Ubuntu 14.04 LTS (204) - until 04.2022 with Canonical's extended support, standard support ended on 04.2019
- Centos 7.8 (219) full updates until 08.2020, and the maintenance updates until 06.2024
- Debian 8.0 (215) - until 06.2020

So even though this was already fixed upstream, we should still fix it in k8s as well I believe. The problem here is centos7 as there lots of people/companies out there are still using it.

Information on EOL timelines:

vespian commented 4 years ago

https://github.com/kubernetes/utils/pull/162

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kubernetes/issues/90327#issuecomment-698946221): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes / kubernetes

Kublet using PIDs for scope names can result in collisions and containers crash-looping. #90327