warm-metal / container-image-csi-driver

Kubernetes CSI driver for mounting image
MIT License
30 stars 22 forks source link

failed: lsetxattr read-only file system on pod start #145

Open woehrl01 opened 7 months ago

woehrl01 commented 7 months ago

Hi,

Would like to test out this really promissing csi driver, but I receive the following error:

spec: failed to apply OCI options: relabel "/var/lib/kubelet/pods/ee74a41d-f2b5-4ac3-9722-80454387d5c9/volume-subpaths/source/nginx/18" with "system_u:object_r:data_t:s0:c246,c908" failed: lsetxattr /var/lib/kubelet/pods/ee74a41d-f2b5-4ac3-9722-80454387d5c9/volume-subpaths/source/nginx/18/p: read-only file system

I already changed the mount and the pod to be readable, but I still have that error. I'm using EKS 1.28 with bottlerocket nodes.

Any ideas what I could try?

Edit: I got it working by setting readOnly: true on the volume directly. Any idea how I can troubleshoot why a writable volume does not work?

Thanks!

mbtamuli commented 7 months ago

Hey @woehrl01 could you share a minimal manifest that could help reproduce the issue?

woehrl01 commented 7 months ago

Thanks you @mbtamuli

Absolutely!

Actually I'm using all defaults. The driver is installed with default helm values (1.1.0 and containerd) and I'm using the following configuration:

https://github.com/warm-metal/container-image-csi-driver/blob/v1.1.0/sample/ephemeral-volume.yaml

It works for me as soon as I add the readOnly flag to the volume definition.

mugdha-adhav commented 7 months ago

@woehrl01 could you please share the logs from node-plugin daemonset pod which is running on the same node where the workload was deployed?

woehrl01 commented 7 months ago
Feb 14, 2024 18:16:56.928
config.go:144] looking for config.json at /config.json

Feb 14, 2024 18:16:56.928
config.go:144] looking for config.json at /config.json

Feb 14, 2024 18:16:56.928
config.go:144] looking for config.json at /root/.docker/config.json

Feb 14, 2024 18:16:56.929
config.go:144] looking for config.json at /.docker/config.json

Feb 14, 2024 18:16:56.929
config.go:110] looking for .dockercfg at /.dockercfg

Feb 14, 2024 18:16:56.929
config.go:110] looking for .dockercfg at /.dockercfg

Feb 14, 2024 18:16:56.929
config.go:110] looking for .dockercfg at /root/.dockercfg

Feb 14, 2024 18:16:56.929
config.go:110] looking for .dockercfg at /.dockercfg

Feb 14, 2024 18:16:56.929
provider.go:82] Docker config file not found: couldn't find valid .dockercfg after checking in [ /root /]

Feb 14, 2024 18:16:56.932
aws_credentials.go:180] unable to get ECR credentials from cache, checking ECR API

Feb 14, 2024 18:16:56.944
aws_credentials.go:295] AWS request: ecr:GetAuthorizationToken in eu-central-1

Feb 14, 2024 18:16:57.022
aws_credentials.go:187] Got ECR credentials from ECR API for ecr_url

Feb 14, 2024 18:17:20.514
pullexecutor.go:92] "Finished pulling image" pod-name="" namespace="" uid="" request-id="10b0b1a4-0b65-41f0-a51c-418f38296289" image="someimage:versiontag" pull-duration="23.583414224s" image-size="161.81 MiB"

Feb 14, 2024 18:17:20.514
mountexecutor.go:63] "Mounting image" pod-name="" namespace="" uid="" request-id="10b0b1a4-0b65-41f0-a51c-418f38296289" image="someimage"

Feb 14, 2024 18:17:20.564
containerd.go:82] image "someimage:versiontag" unpacked

Feb 14, 2024 18:17:20.566
mounter.go:193] create read-write snapshot of image "someimage:versiontag" with key "csi-image.warm-metal.tech-csi-38e100b5eeb529e82e9cfb68091a734c716db143c2be52cf78b720df72f331cd"

Feb 14, 2024 18:17:20.568
containerd.go:120] create rw snapshot "csi-image.warm-metal.tech-csi-38e100b5eeb529e82e9cfb68091a734c716db143c2be52cf78b720df72f331cd" for image "sha256:f15ada610a14d12f77b5180515b5df4a7dc81bf5cff0dcf35e929f9f6968cb87" with metadata map[string]string{"containerd.io/gc.root":"2024-02-14T17:17:20Z"}

Feb 14, 2024 18:17:20.581
mountexecutor.go:87] "Finished mounting" pod-name="" namespace="" uid="" request-id="10b0b1a4-0b65-41f0-a51c-418f38296289" image="someimage" mount-duration="66.150696ms"

Feb 14, 2024 18:17:30.040
utils.go:97] GRPC call: /csi.v1.Identity/Probe

Feb 14, 2024 18:17:47.720
utils.go:97] GRPC call: /csi.v1.Node/NodeGetCapabilities

Feb 14, 2024 18:18:29.938
utils.go:97] GRPC call: /csi.v1.Identity/Probe

Feb 14, 2024 18:19:07.579
utils.go:97] GRPC call: /csi.v1.Node/NodeGetCapabilities

Feb 14, 2024 18:19:29.938
utils.go:97] GRPC call: /csi.v1.Identity/Probe

Feb 14, 2024 18:25:15.557
utils.go:97] GRPC call: /csi.v1.Node/NodePublishVolume

Feb 14, 2024 18:25:15.563
node_server.go:64] "Incoming NodePublishVolume request" pod-name="" namespace="" uid="" request-id="c74dea77-bfbb-4c26-a7e4-d33917a022c6" request string="volume_id:\"csi-ab8d5c61d4c47fe3a6f19f5e916c4e4ff1f812848b01adbf694a27d4ca2bc3ae\" target_path:\"/var/lib/kubelet/pods/b961b108-161d-45c5-9430-81eaa8d2bd29/volumes/kubernetes.io~csi/source/mount\" volume_capability:<mount:<> access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"csi.storage.k8s.io/ephemeral\" value:\"true\" > volume_context:<key:\"csi.storage.k8s.io/pod.name\" value:\"combined-web-5ff68fbfdb-7tjn8\" > volume_context:<key:\"csi.storage.k8s.io/pod.namespace\" value:\"p67747\" > volume_context:<key:\"csi.storage.k8s.io/pod.uid\" value:\"b961b108-161d-45c5-9430-81eaa8d2bd29\" > volume_context:<key:\"csi.storage.k8s.io/serviceAccount.name\" value:\"default\" > volume_context:<key:\"image\" value:\"someimage:versiontag\" > "

Feb 14, 2024 18:25:15.579
mount_linux.go:286] 'umount /tmp/kubelet-detect-safe-umount2272271457' failed with: exit status 1, output: umount: can't unmount /tmp/kubelet-detect-safe-umount2272271457: Invalid argument

Feb 14, 2024 18:25:15.579
mount_linux.go:288] Detected umount with unsafe 'not mounted' behavior

Feb 14, 2024 18:25:15.582
plugins.go:73] Registering credential provider: .dockercfg

Feb 14, 2024 18:25:15.582
plugins.go:73] Registering credential provider: amazon-ecr

Feb 14, 2024 18:25:15.589
mountexecutor.go:63] "Mounting image" pod-name="" namespace="" uid="" request-id="c74dea77-bfbb-4c26-a7e4-d33917a022c6" image="someimage"

Feb 14, 2024 18:25:15.605
containerd.go:82] image "someimage:versiontag" unpacked

Feb 14, 2024 18:25:15.606
mounter.go:193] create read-write snapshot of image "someimage:versiontag" with key "csi-image.warm-metal.tech-csi-ab8d5c61d4c47fe3a6f19f5e916c4e4ff1f812848b01adbf694a27d4ca2bc3ae"

Feb 14, 2024 18:25:15.607
containerd.go:120] create rw snapshot "csi-image.warm-metal.tech-csi-ab8d5c61d4c47fe3a6f19f5e916c4e4ff1f812848b01adbf694a27d4ca2bc3ae" for image "sha256:f15ada610a14d12f77b5180515b5df4a7dc81bf5cff0dcf35e929f9f6968cb87" with metadata map[string]string{"containerd.io/gc.root":"2024-02-14T17:25:15Z"}

Feb 14, 2024 18:25:15.619
mountexecutor.go:87] "Finished mounting" pod-name="" namespace="" uid="" request-id="c74dea77-bfbb-4c26-a7e4-d33917a022c6" image="someimage" mount-duration="29.932342ms"

Feb 14, 2024 18:25:29.938
utils.go:97] GRPC call: /csi.v1.Identity/Probe

Feb 14, 2024 18:25:47.739
utils.go:97] GRPC call: /csi.v1.Node/NodeGetCapabilities

Feb 14, 2024 18:26:29.938
utils.go:97] GRPC call: /csi.v1.Identity/Probe

Feb 14, 2024 18:27:43.251
utils.go:97] GRPC call: /csi.v1.Node/NodeUnpublishVolume

Feb 14, 2024 18:27:43.252
node_server.go:180] unmount request: volume_id:"csi-38e100b5eeb529e82e9cfb68091a734c716db143c2be52cf78b720df72f331cd" target_path:"/var/lib/kubelet/pods/fb74c7d2-2668-4b5b-8849-0d18cb98ee30/volumes/kubernetes.io~csi/source/mount"

Feb 14, 2024 18:27:43.282
mount_linux.go:286] 'umount /tmp/kubelet-detect-safe-umount566345078' failed with: exit status 1, output: umount: can't unmount /tmp/kubelet-detect-safe-umount566345078: Invalid argument

Feb 14, 2024 18:27:43.282
mount_linux.go:288] Detected umount with unsafe 'not mounted' behavior

Feb 14, 2024 18:27:43.293
mounter.go:211] unmount volume "csi-38e100b5eeb529e82e9cfb68091a734c716db143c2be52cf78b720df72f331cd" at "/var/lib/kubelet/pods/fb74c7d2-2668-4b5b-8849-0d18cb98ee30/volumes/kubernetes.io~csi/source/mount"

Feb 14, 2024 18:27:43.297
mounter.go:216] try to unref read-only snapshot

Feb 14, 2024 18:27:43.297
mounter.go:135] target "/var/lib/kubelet/pods/fb74c7d2-2668-4b5b-8849-0d18cb98ee30/volumes/kubernetes.io~csi/source/mount" is not read-only

Feb 14, 2024 18:27:43.297
mounter.go:222] delete the read-write snapshot

Feb 14, 2024 18:27:43.300
containerd.go:189] remove snapshot "csi-image.warm-metal.tech-csi-38e100b5eeb529e82e9cfb68091a734c716db143c2be52cf78b720df72f331cd"

Feb 14, 2024 18:28:29.938
utils.go:97] GRPC call: /csi.v1.Identity/Probe

Feb 14, 2024 18:29:13.668
utils.go:97] GRPC call: /csi.v1.Node/NodeGetCapabilities
mugdha-adhav commented 7 months ago

Currently in our automated builds we are only testing the driver against k8s version v1.25. Here's the compatibility matrix info.

I tested the driver on a kind cluster with k8s version v1.28.3 using containerd, I was able to run the sample ephemeral workload as expected.

$ kubectl logs ephemeral-volume-thb2m
+ '[' /target '!='  ]
+ '[' -f /target/csi-file1 ]
+ '[' -f /target/csi-file2 ]
+ '[' -d /target/csi-folder1 ]
+ '[' -f /target/csi-folder1/file ]
+ exit 0
mugdha-adhav commented 7 months ago

@woehrl01 I don't see any relevant errors in the logs your shared here.

Also the error you shared in the issue description, where is it coming from?

woehrl01 commented 7 months ago

@mugdha-adhav yes, your right the logs don't shoe any problems, that's why I asked if you have additional ideas to debug this.

The error I recieve is a kubernetes event on the pod and it's created by kubelet:

 {  "event.firstTimestamp": "2024-02-14T19:03:22Z", "event.involvedObject.apiVersion": "v1", "event.involvedObject.fieldPath": "spec.containers{ng}", "event.involvedObject.kind": "Pod", "event.involvedObject.name": "h-8648c99fbf-wkgrg", "event.involvedObject.namespace": "j", "event.involvedObject.resourceVersion": "1600765086", "event.involvedObject.uid": "7b39946a-5be40ee06e49b7d", "event.lastTimestamp": "2024-02-14T19:03:34Z", "event.message": "(combined from similar events): Error: failed to generate container \"218b131a8748748b7ba121c4a2fd5a6b182659fcecdff0357bd106aa1b1fcfb4\" spec: failed to apply OCI options: relabel \"/var/lib/kubelet/pods/7b39946a-5be4-49c4-8f52-20ee06e49b7d/volumes/kubernetes.io~csi/source/mount\" with \"system_u:object_r:data_t:s0:c211,c621\" failed: lsetxattr /var/lib/kubelet/pods/7b39946a-5be4-49c4-8f52-20ee06e49b7d/volumes/kubernetes.io~csi/source/mount/var: read-only file system", "event.metadata.creationTimestamp": "2024-02-14T19:03:22Z", "event.metadata.managedFields[0].apiVersion": "v1", "event.metadata.managedFields[0].fieldsType": "FieldsV1", "event.metadata.managedFields[0].manager": "kubelet", "event.metadata.managedFields[0].operation": "Update", "event.metadata.managedFields[0].time": "2024-02-14T19:03:34Z", "event.metadata.name": "h8648c99fbf-wkgrg.17b3d004cd0490fd", "event.metadata.namespace": "p67747", "event.metadata.resourceVersion": "1600786275", "event.metadata.uid": "ac9784a8-f562-48d4-b9be-fe926e9a3c13", "event.reason": "Failed", "event.source.component": "kubelet", "event.source.host": "ip--.f.compute.internal", "event.type": "Warning", "integrationName": "kube_events", "integrationVersion": "2.8.1", "old_event.count": 1, "old_event.firstTimestamp": "2024-02-14T19:03:22Z", "old_event.involvedObject.apiVersion": "v1", "old_event.involvedObject.fieldPath": "spec.containers{ng}", "old_event.involvedObject.kind": "Pod", "old_event.involvedObject.name": "h-8648c99fbf-wkgrg", "old_event.involvedObject.namespace": "h", "old_event.involvedObject.resourceVersion": "1600765086", "old_event.involvedObject.uid": "7b39946a-5beee06e49b7d", "old_event.lastTimestamp": "2024-02-14T19:03:22Z", "old_event.message": "(combined from similar events): Error: failed to generate container \"be6c7b7e8c3d4a3386a047312d173f6a94490d461e3073d6205cc1cf888f8f24\" spec: failed to apply OCI options: relabel \"/var/lib/kubelet/pods/7b39946a-5be4-49c4-8f52-20ee06e49b7d/volumes/kubernetes.io~csi/source/mount\" with \"system_u:object_r:data_t:s0:c211,c621\" failed: lsetxattr /var/lib/kubelet/pods/7b39946a-5be4-49c4-8f52-20ee06e49b7d/volumes/kubernetes.io~csi/source/mount/var: read-only file system", "old_event.metadata.creationTimestamp": "2024-02-14T19:03:22Z", "old_event.metadata.managedFields[0].apiVersion": "v1", "old_event.metadata.managedFields[0].fieldsType": "FieldsV1", "old_event.metadata.managedFields[0].manager": "kubelet", "old_event.metadata.managedFields[0].operation": "Update", "old_event.metadata.managedFields[0].time": "2024-02-14T19:03:22Z", "old_event.metadata.name": "h-8648c99fbf-wkgrg.17b3d004cd0490fd", "old_event.metadata.namespace": "h", "old_event.metadata.resourceVersion": "1600782603", "old_event.metadata.uid": "ac9784a8-f562-48d4-b9b", "old_event.reason": "Failed", "old_event.source.component": "kubelet", "old_event.source.host": ".compute.internal", "old_event.type": "Warning", "summary": "(combined from similar events): Error: failed to generate container \"218b131a8748748b7ba121c4a2fd5a6b182659fcecdff0357bd106aa1b1fcfb4\" spec: failed to apply OCI options: relabel \"/var/lib/kubelet/pods/7b39946a-5be4-49c4-8f52-20ee06e49b7d/volumes/kubernetes.io~csi/source/mount\" with \"system_u:object_r:data_t:s0:c211,c621\" failed: lsetxattr /var/lib/kubelet/pods/7b39946a-5be4-49c4-8f52-20ee06e49b7d/volumes/kubernetes.io~csi/source/mount/var: read-only file system", "timestamp": 1707937414000, "verb": "UPDATE"
woehrl01 commented 7 months ago

I just found the following related issue of a different csi driver with bottlerocket, looks like the issue is selinux + hostpath mount related:

https://github.com/bottlerocket-os/bottlerocket/issues/2556

A fix is described here by passing different mount labels: https://github.com/bottlerocket-os/bottlerocket/issues/2656#issuecomment-1408912457

mugdha-adhav commented 7 months ago

Interesting, it seems the issue is platform specific. We could add values-bottlerocket-selinux.yaml file in our charts and add support for passing volumes and volumeMounts parameters.

woehrl01 commented 7 months ago

@mugdha-adhav I think it makes sense to add those different mount options to the helm chart. I also think that the context needs to be passed in the sourcecode. If I see it correct then the additonal mount options need to be set here, right before the mount.All call:

https://github.com/warm-metal/container-image-csi-driver/blob/3d36010ac57a9324f23d96ce22c7355a2878ef42/pkg/backend/containerd/containerd.go#L38-L45

mugdha-adhav commented 7 months ago

@woehrl01 would appreciate your help with sending out a PR for this. Please let me know if you want me to assign this issue to you.

haydn-j-evans commented 4 months ago

Hi,

We had the same issue with trying to mount a Loki PVC on bottlerocket 1.29

Loki sets its data chunks to read/only after it has finished writing to them, this is what I assume is actually causing the failure (having files in the r/w mount being read only, so SELinux cannot relabel them).

image

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

mugdha-adhav commented 1 month ago

@woehrl01 (or anyone else) would you be interested in sending a fix for this? I haven't worked with bottlerocket yet, so it might take me a bit longer to get the fix out.