Closed till closed 2 years ago
not expert in SecurityContext, so you are saying
securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
allowPrivilegeEscalation: true
is the root cause? And from the log you provided, seems it's not related to CPO, instead it's related to General k8s (mount part) , will take a deeper look at this in case have any findings
not expert in SecurityContext, so you are saying
securityContext: privileged: true capabilities: add: ["SYS_ADMIN"] allowPrivilegeEscalation: true
is the root cause?
No, rather this:
securityContext:
fsGroup: 1000
runAsGroup: 1000
runAsNonRoot: true
runAsUser: 100
It works when I disable it in various charts. An "easier" example is bitnami/postgresql
: When I run the chart, I get these mount errors. When I disable these settings, the stateful set starts (with PVCs) and all is well.
Are there any other logs to provide? Anything that could be helpful?
@jichenjc I just want to save you some time, it seems there is a problem with "my" setup. I just debugged another issues and found that:
So then what happens is: k8s uses the path anyway (container write to it) and then all kinds of things go wrong later on, e.g. the pods are stuck terminating because the path is not empty after unmount). I am not sure yet how to debug why it's not being mounted.
I am walking through the setup with the nginx example app, it demoes what we see otherwise. The pod is deployed and running, and claims that /var/lib/www/html
is backed by a volume:
# kubectl describe -n till-test pod/nginx
Name: nginx
Namespace: till-test
Priority: 0
Node: node-005.testing/10.108.109.29
Start Time: Wed, 11 May 2022 11:59:21 +0000
Labels: <none>
Annotations: cni.projectcalico.org/containerID: c301ca49afad81118c677c9281a05c20f3bb0bfd00dd9f72e4ee8d9bd503ffa4
cni.projectcalico.org/podIP: 10.244.196.217/32
cni.projectcalico.org/podIPs: 10.244.196.217/32
kubernetes.io/psp: 00-k0s-privileged
Status: Running
IP: 10.244.196.217
IPs:
IP: 10.244.196.217
Containers:
nginx:
Container ID: containerd://7eca53cf417dec63e921a137fc207d653fa70271df8e151c4431b116db020d4d
Image: nginx
Image ID: docker.io/library/nginx@sha256:19da26bd6ef0468ac8ef5c03f01ce1569a4dbfb82d4d7b7ffbd7aed16ad3eb46
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 11 May 2022 11:59:38 +0000
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/www/html from csi-data-cinderplugin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tpjw5 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
csi-data-cinderplugin:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: csi-pvc-cinderplugin
ReadOnly: false
kube-api-access-tpjw5:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 14m default-scheduler 0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.
Normal Scheduled 14m default-scheduler Successfully assigned till-test/nginx to node-005.testing
Normal SuccessfulAttachVolume 14m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808"
Normal Pulling 14m kubelet Pulling image "nginx"
Normal Pulled 14m kubelet Successfully pulled image "nginx" in 5.696958139s
Normal Created 14m kubelet Created container nginx
Normal Started 14m kubelet Started container nginx
I verified the pvc is bound:
# kubectl get pvc -n till-test
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
csi-pvc-cinderplugin Bound pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808 1Gi RWO csi-sc-cinderplugin 15m
Verified with openstack that it's also in the API (openstack volume show pvc-...
):
{
"attachments": [
{
"server_id": "1e9a3760-ddd9-44bb-a313-811d7c8e050a",
"attachment_id": "4e9b1eee-cedf-4204-9a6d-0d97d9677127",
"attached_at": "2022-05-11T11:59:22.717899",
"host_name": "redacted",
"volume_id": "4f71f032-6110-4352-a019-399d0b80da73",
"device": "/dev/sdd",
"id": "4f71f032-6110-4352-a019-399d0b80da73"
}
],
"availability_zone": "nova",
"bootable": "false",
"consistencygroup_id": null,
"created_at": "2022-05-11T11:59:20.098842",
"description": "Created by OpenStack Cinder CSI driver",
"encrypted": false,
"id": "4f71f032-6110-4352-a019-399d0b80da73",
"multiattach": false,
"name": "pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808",
"os-vol-tenant-attr:tenant_id": "redacted",
"properties": {
"vzhardware": "1.0.2",
"cinder.csi.openstack.org/cluster": "redacted",
"csi.storage.k8s.io/pv/name": "pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808",
"csi.storage.k8s.io/pvc/namespace": "till-test",
"csi.storage.k8s.io/pvc/name": "csi-pvc-cinderplugin"
},
"replication_status": null,
"size": 1,
"snapshot_id": null,
"source_volid": null,
"status": "in-use",
"type": "default",
"updated_at": "2022-05-11T11:59:22.986200",
"user_id": "redacted"
}
Using the device (/dev/sdd
), I can also verify it with fdisk
on the node:
sudo fdisk -l /dev/sdd
Disk /dev/sdd: 1 GiB, 1073741824 bytes, 2097152 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
There are a few other disks attached by us, non are 1GB. So this is it.
The only thing where I come up empty is mount | grep sdd
. So everything works, but it is not mounted. I've been scrolling through the logs of the controller and the plugin to no avail.
These are logs from the provisioner:
I0510 14:43:43.652278 1 controller.go:884] Started provisioner controller cinder.csi.openstack.org_csi-cinder-controllerplugin-0_90275fa2-1cc8-49b6-bab5-e48116599ef0!
I0511 11:59:19.902248 1 controller.go:1332] provision "till-test/csi-pvc-cinderplugin" class "csi-sc-cinderplugin": started
I0511 11:59:19.902536 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"till-test", Name:"csi-pvc-cinderplugin", UID:"0b3b178e-5d7e-44e8-bde8-b007a5541808", APIVersion:"v1", ResourceVersion:"4197899", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "till-test/csi-pvc-cinderplugin"
I0511 11:59:20.193201 1 controller.go:1439] provision "till-test/csi-pvc-cinderplugin" class "csi-sc-cinderplugin": volume "pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808" provisioned
I0511 11:59:20.193238 1 controller.go:1456] provision "till-test/csi-pvc-cinderplugin" class "csi-sc-cinderplugin": succeeded
I0511 11:59:20.200679 1 controller.go:1332] provision "till-test/csi-pvc-cinderplugin" class "csi-sc-cinderplugin": started
I0511 11:59:20.200730 1 controller.go:1341] provision "till-test/csi-pvc-cinderplugin" class "csi-sc-cinderplugin": persistentvolume "pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808" already exists, skipping
I0511 11:59:20.200667 1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"till-test", Name:"csi-pvc-cinderplugin", UID:"0b3b178e-5d7e-44e8-bde8-b007a5541808", APIVersion:"v1", ResourceVersion:"4197899", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808
In cinder-csi-nodeplugin-XXX
(on the node) I see the following
I0511 11:59:31.991674 1 mount_linux.go:408] Disk "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_4f71f032-6110-4352-a" appears to be unformatted, attempting to format as type: "ext4" with options: [-F -m0 /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_4f71f032-6110-4352-a]
I0511 11:59:32.125127 1 mount_linux.go:418] Disk successfully formatted (mkfs): ext4 - /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_4f71f032-6110-4352-a /var/lib/k0s/kubelet/plugins/kubernetes.io/csi/pv/pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808/globalmount
But beyond that, nothing.
I found the following on my worker node in the journal:
May 11 11:59:21 node-005.testing k0s[1045]: time="2022-05-11 11:59:21" level=info msg="I0511 11:59:21.578730 1315 reconciler.go:221] \"operationExecutor.VerifyControllerAttachedVolume started for volume \\\"kube-api-access-tpjw5\\\" (UniqueName: \\\"kubernetes.io/projected/9e9c5f29-dcc3-4b81-8ec2-66770f7c077b-kube-api-access-tpjw5\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") \" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:21 node-005.testing k0s[1045]: time="2022-05-11 11:59:21" level=info msg="I0511 11:59:21.679515 1315 reconciler.go:258] \"operationExecutor.MountVolume started for volume \\\"kube-api-access-tpjw5\\\" (UniqueName: \\\"kubernetes.io/projected/9e9c5f29-dcc3-4b81-8ec2-66770f7c077b-kube-api-access-tpjw5\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") \" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:21 node-005.testing k0s[1045]: time="2022-05-11 11:59:21" level=info msg="I0511 11:59:21.689911 1315 operation_generator.go:756] \"MountVolume.SetUp succeeded for volume \\\"kube-api-access-tpjw5\\\" (UniqueName: \\\"kubernetes.io/projected/9e9c5f29-dcc3-4b81-8ec2-66770f7c077b-kube-api-access-tpjw5\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") \" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:31 node-005.testing k0s[1045]: time="2022-05-11 11:59:31" level=info msg="I0511 11:59:31.445437 1315 reconciler.go:221] \"operationExecutor.VerifyControllerAttachedVolume started for volume \\\"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") \" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:31 node-005.testing k0s[1045]: time="2022-05-11 11:59:31" level=info msg="I0511 11:59:31.448773 1315 operation_generator.go:1587] \"Controller attach succeeded for volume \\\"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") device path: \\\"\\\"\" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:31 node-005.testing k0s[1045]: time="2022-05-11 11:59:31" level=info msg="I0511 11:59:31.546112 1315 reconciler.go:258] \"operationExecutor.MountVolume started for volume \\\"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") \" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:31 node-005.testing k0s[1045]: time="2022-05-11 11:59:31" level=info msg="I0511 11:59:31.546577 1315 operation_generator.go:631] \"MountVolume.WaitForAttach entering for volume \\\"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") DevicePath \\\"\\\"\" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:31 node-005.testing k0s[1045]: time="2022-05-11 11:59:31" level=info msg="I0511 11:59:31.552712 1315 operation_generator.go:641] \"MountVolume.WaitForAttach succeeded for volume \\\"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") DevicePath \\\"csi-1fc085fb7392cf678bdbc0769341b7cb92fa88bf44f01db70ff51ad42d57117f\\\"\" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:32 node-005.testing k0s[1045]: time="2022-05-11 11:59:32" level=info msg="I0511 11:59:32.132470 1315 operation_generator.go:674] \"MountVolume.MountDevice succeeded for volume \\\"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") device mount path \\\"/var/lib/k0s/kubelet/plugins/kubernetes.io/csi/pv/pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808/globalmount\\\"\" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:32 node-005.testing k0s[1045]: time="2022-05-11 11:59:32" level=info msg="I0511 11:59:32.341262 1315 operation_generator.go:756] \"MountVolume.SetUp succeeded for volume \\\"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") pod \\\"nginx\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") \" pod=\"till-test/nginx\"" component=kubelet
May 11 11:59:32 node-005.testing k0s[1045]: time="2022-05-11 11:59:32" level=info msg="time=\"2022-05-11T11:59:32.617992314Z\" level=info msg=\"RunPodsandbox for &PodSandboxMetadata{Name:nginx,Uid:9e9c5f29-dcc3-4b81-8ec2-66770f7c077b,Namespace:till-test,Attempt:0,}\"" component=containerd
...
This seems to suggest the mount work, but I can't figure out why it doesn't?
When I delete the pod, more drama in the logs:
May 11 13:02:10 node-005.testing k0s[1045]: time="2022-05-11 13:02:10" level=info msg="I0511 13:02:10.670072 1315 reconciler.go:192] \"operationExecutor.UnmountVolume started for volume \\\"csi-data-cinderplugin\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") pod \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") \"" component=kubelet
May 11 13:02:10 node-005.testing k0s[1045]: time="2022-05-11 13:02:10" level=info msg="E0511 13:02:10.727480 1315 nestedpendingoperations.go:335] Operation for \"{volumeName:kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73 podName:9e9c5f29-dcc3-4b81-8ec2-66770f7c077b nodeName:}\" failed. No retries permitted until 2022-05-11 13:04:12.727446789 +0000 UTC m=+75209.968642245 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume \"csi-data-cinderplugin\" (UniqueName: \"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\") pod \"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\" (UID: \"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\") : kubernetes.io/csi: Unmounter.TearDownAt failed to clean mount dir [/var/lib/k0s/kubelet/pods/9e9c5f29-dcc3-4b81-8ec2-66770f7c077b/volumes/kubernetes.io~csi/pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808/mount]: kubernetes.io/csi: failed to remove dir [/var/lib/k0s/kubelet/pods/9e9c5f29-dcc3-4b81-8ec2-66770f7c077b/volumes/kubernetes.io~csi/pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808/mount]: remove /var/lib/k0s/kubelet/pods/9e9c5f29-dcc3-4b81-8ec2-66770f7c077b/volumes/kubernetes.io~csi/pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808/mount: directory not empty" component=kubelet
The directory contains an index.html
which I created inside the nginx pod:
node-005 /home/core # ls -lah /var/lib/k0s/kubelet/pods/9e9c5f29-dcc3-4b81-8ec2-66770f7c077b/volumes/kubernetes.io~csi/pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808/mount/
total 20K
drwxr-xr-x. 2 root root 4.0K May 11 12:45 .
drwxr-x---. 3 root root 4.0K May 11 11:59 ..
-rw-r--r--. 1 root root 0 May 11 12:45 index.html
When I remove the file, it completes:
May 11 13:10:19 node-005.testing k0s[1045]: time="2022-05-11 13:10:19" level=info msg="I0511 13:10:19.534601 1315 reconciler.go:192] \"operationExecutor.UnmountVolume started for volume \\\"csi-data-cinderplugin\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") pod \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\" (UID: \\\"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\\\") \"" component=kubelet
May 11 13:10:19 node-005.testing k0s[1045]: time="2022-05-11 13:10:19" level=info msg="I0511 13:10:19.588457 1315 operation_generator.go:910] UnmountVolume.TearDown succeeded for volume \"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\" (OuterVolumeSpecName: \"csi-data-cinderplugin\") pod \"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\" (UID: \"9e9c5f29-dcc3-4b81-8ec2-66770f7c077b\"). InnerVolumeSpecName \"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\". PluginName \"kubernetes.io/csi\", VolumeGidValue \"\"" component=kubelet
May 11 13:10:19 node-005.testing k0s[1045]: time="2022-05-11 13:10:19" level=info msg="I0511 13:10:19.635735 1315 reconciler.go:293] \"operationExecutor.UnmountDevice started for volume \\\"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") on node \\\"node-005.testing\\\" \"" component=kubelet
May 11 13:10:19 node-005.testing kernel: Buffer I/O error on dev sdd, logical block 131072, lost sync page write
May 11 13:10:19 node-005.testing kernel: JBD2: Error -5 detected when updating journal superblock for sdd-8.
May 11 13:10:19 node-005.testing kernel: Aborting journal on device sdd-8.
May 11 13:10:19 node-005.testing kernel: Buffer I/O error on dev sdd, logical block 131072, lost sync page write
May 11 13:10:19 node-005.testing kernel: JBD2: Error -5 detected when updating journal superblock for sdd-8.
May 11 13:10:19 node-005.testing kernel: EXT4-fs error (device sdd): ext4_put_super:1189: comm umount: Couldn't clean up the journal
May 11 13:10:19 node-005.testing kernel: EXT4-fs (sdd): Remounting filesystem read-only
May 11 13:10:19 node-005.testing k0s[1045]: time="2022-05-11 13:10:19" level=info msg="I0511 13:10:19.817292 1315 operation_generator.go:1024] UnmountDevice succeeded for volume \"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\" (UniqueName: \"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\") on node \"node-005.testing\" " component=kubelet
May 11 13:10:19 node-005.testing k0s[1045]: time="2022-05-11 13:10:19" level=info msg="I0511 13:10:19.836573 1315 reconciler.go:300] \"Volume detached for volume \\\"pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808\\\" (UniqueName: \\\"kubernetes.io/csi/cinder.csi.openstack.org^4f71f032-6110-4352-a019-399d0b80da73\\\") on node \\\"node-005.testing\\\" DevicePath \\\"csi-1fc085fb7392cf678bdbc0769341b7cb92fa88bf44f01db70ff51ad42d57117f\\\"\"" component=kubelet
May 11 13:10:24 node-005.testing k0s[1045]: time="2022-05-11 13:10:24" level=info msg="I0511 13:10:24.934817 1315 kubelet_volumes.go:160] \"Cleaned up orphaned pod volumes dir\" podUID=9e9c5f29-dcc3-4b81-8ec2-66770f7c077b path=\"/var/lib/k0s/kubelet/pods/9e9c5f29-dcc3-4b81-8ec2-66770f7c077b/volumes\"" component=kubelet
I checked your log again and I suspect this
May 09 11:03:55 node-006.testing k0s[611865]: time="2022-05-09 11:03:55" level=info msg="E0509 11:03:55.863827 611893 nestedpendingoperations.go:335] Operation for \"{volumeName:kubernetes.io/csi/cinder.csi.openstack.org^59be3fd4-ba05-4d06-a2e1-e604d48072a1 podName: nodeName:}\" failed. No retries permitted until 2022-05-09 11:03:56.363803627 +0000 UTC m=+1452542.044119761 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume \"pvc-2c060d87-4355-436d-b42e-15becdcfb938\" (UniqueName: \"kubernetes.io/csi/cinder.csi.openstack.org^59be3fd4-ba05-4d06-a2e1-e604d48072a1\") pod \"server-1\" (UID: \"3a72dd18-5a6a-4300-acc5-93c09f3b874d\") : applyFSGroup failed for vol 59be3fd4-ba05-4d06-a2e1-e604d48072a1: lstat /var/lib/k0s/kubelet/pods/3a72dd18-5a6a-4300-acc5-93c09f3b874d/volumes/kubernetes.io~csi/pvc-2c060d87-4355-436d-b42e-15becdcfb938/mount: no such file or directory" component=kubelet
seems https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/csi/csi_mounter.go#L289 has been hit , as you know CSI need do a set of operations including publish the volume in the node and that will mount the device ,so the mount failed above due to FS permission issue
so might need check or compare between success case (without your setting) and fail case on the function volume.SetVolumeOwnership
I checked your log again and I suspect this
May 09 11:03:55 node-006.testing k0s[611865]: time="2022-05-09 11:03:55" level=info msg="E0509 11:03:55.863827 611893 nestedpendingoperations.go:335] Operation for \"{volumeName:kubernetes.io/csi/cinder.csi.openstack.org^59be3fd4-ba05-4d06-a2e1-e604d48072a1 podName: nodeName:}\" failed. No retries permitted until 2022-05-09 11:03:56.363803627 +0000 UTC m=+1452542.044119761 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume \"pvc-2c060d87-4355-436d-b42e-15becdcfb938\" (UniqueName: \"kubernetes.io/csi/cinder.csi.openstack.org^59be3fd4-ba05-4d06-a2e1-e604d48072a1\") pod \"server-1\" (UID: \"3a72dd18-5a6a-4300-acc5-93c09f3b874d\") : applyFSGroup failed for vol 59be3fd4-ba05-4d06-a2e1-e604d48072a1: lstat /var/lib/k0s/kubelet/pods/3a72dd18-5a6a-4300-acc5-93c09f3b874d/volumes/kubernetes.io~csi/pvc-2c060d87-4355-436d-b42e-15becdcfb938/mount: no such file or directory" component=kubelet
seems https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/csi/csi_mounter.go#L289
has been hit , as you know CSI need do a set of operations including publish the volume in the node and that will mount the device ,so the mount failed above due to FS permission issue
so might need check or compare between success case (without your setting) and fail case on the function
volume.SetVolumeOwnership
Thank you, I'll have a look. I have a k8s cluster provisioned with Magnum to test today and this will help comparing.
So all the cinder stuff is in tree still?
So all the cinder stuff is in tree still?
cinder in tree==> don't undertsand the question, cinder are upstream openstack opensource used to be in-tree code in k8s but all move to CPO (this repo), so no in tree code any more
Nevermind, it's a general mount utils.
Where you able to look at further comments? The one you looked at is helpful, but the new logs are on node-005. They show that the mount does not complete.
Ok, so, I did the following:
--v=6
and found this:I0512 10:44:02.950756 1 nodeserver.go:489] NodeGetCapabilities called with req: &csi.NodeGetCapabilitiesRequest{XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
I0512 10:44:02.950772 1 utils.go:94] [ID:22] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":2}}}]}
I0512 10:44:02.951798 1 utils.go:88] [ID:23] GRPC call: /csi.v1.Node/NodeGetVolumeStats
I0512 10:44:02.951814 1 utils.go:89] [ID:23] GRPC request: {"volume_id":"4f71f032-6110-4352-a019-399d0b80da73","volume_path":"/var/lib/k0s/kubelet/pods/b12e28da-f62c-4255-aafc-8a5f1f1fa567/volumes/kubernetes.io~csi/pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808/mount"}
I0512 10:44:02.951867 1 nodeserver.go:497] NodeGetVolumeStats: called with args {"volume_id":"4f71f032-6110-4352-a019-399d0b80da73","volume_path":"/var/lib/k0s/kubelet/pods/b12e28da-f62c-4255-aafc-8a5f1f1fa567/volumes/kubernetes.io~csi/pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808/mount"}
E0512 10:44:02.951915 1 utils.go:92] [ID:23] GRPC error: rpc error: code = NotFound desc = target: /var/lib/k0s/kubelet/pods/b12e28da-f62c-4255-aafc-8a5f1f1fa567/volumes/kubernetes.io~csi/pvc-0b3b178e-5d7e-44e8-bde8-b007a5541808/mount not found
I0512 10:44:05.816923 1 utils.go:88] [ID:24] GRPC call: /csi.v1.Identity/Probe
I think this log is from the csi-cinder-controller
Pod (and then container cinder-csi-plugin
. This led me to double-check how paths are mounted to the nodeplugin DaemonSet, which leaves me with this manifest (which works 🥳 ):
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: csi-cinder-nodeplugin
namespace: kube-system
spec:
selector:
matchLabels:
app: csi-cinder-nodeplugin
template:
metadata:
labels:
app: csi-cinder-nodeplugin
spec:
tolerations:
- operator: Exists
serviceAccount: csi-cinder-node-sa
hostNetwork: true
containers:
- name: node-driver-registrar
image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v1.3.0
args:
- "--v=3"
- "--csi-address=$(ADDRESS)"
- "--kubelet-registration-path=$(DRIVER_REG_SOCK_PATH)"
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "rm -rf /registration/cinder.csi.openstack.org /registration/cinder.csi.openstack.org-reg.sock"]
env:
- name: ADDRESS
value: /csi/csi.sock
- name: DRIVER_REG_SOCK_PATH
value: /var/lib/k0s/kubelet/plugins/cinder.csi.openstack.org/csi.sock
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
imagePullPolicy: "IfNotPresent"
volumeMounts:
- name: socket-dir
mountPath: /csi
- name: registration-dir
mountPath: /registration
- name: liveness-probe
image: k8s.gcr.io/sig-storage/livenessprobe:v2.1.0
args:
- --csi-address=/csi/csi.sock
volumeMounts:
- name: socket-dir
mountPath: /csi
- name: cinder-csi-plugin
securityContext:
privileged: true
capabilities:
add: ["SYS_ADMIN"]
allowPrivilegeEscalation: true
image: docker.io/k8scloudprovider/cinder-csi-plugin:v1.23.0
args:
- /bin/cinder-csi-plugin
- "--v=3"
- "--endpoint=$(CSI_ENDPOINT)"
- "--cloud-config=$(CLOUD_CONFIG)"
env:
- name: CSI_ENDPOINT
value: unix://csi/csi.sock
- name: CLOUD_CONFIG
value: /etc/config/cloud.conf
imagePullPolicy: "IfNotPresent"
ports:
- containerPort: 9808
name: healthz
protocol: TCP
# The probe
livenessProbe:
failureThreshold: 5
httpGet:
path: /healthz
port: healthz
initialDelaySeconds: 10
timeoutSeconds: 3
periodSeconds: 10
volumeMounts:
- name: socket-dir
mountPath: /csi
# FIXME: for k0s
- name: kubelet-dir
mountPath: /var/lib/k0s/kubelet
mountPropagation: "Bidirectional"
- name: pods-probe-dir
mountPath: /dev
mountPropagation: "HostToContainer"
- name: secret-cinderplugin
mountPath: /etc/config
readOnly: true
volumes:
# FIXME: for k0s
- name: socket-dir
hostPath:
path: /var/lib/k0s/kubelet/plugins/cinder.csi.openstack.org
type: DirectoryOrCreate
# FIXME: for k0s
- name: registration-dir
hostPath:
path: /var/lib/k0s/kubelet/plugins_registry/
type: Directory
# FIXME: for k0s
- name: kubelet-dir
hostPath:
path: /var/lib/k0s/kubelet
type: Directory
- name: pods-probe-dir
hostPath:
path: /dev
type: Directory
- name: secret-cinderplugin
secret:
secretName: cloud-config
The kubelet-dir
has to be mounted to a path on the host (which is not /var/lib/kubelet
in my case). Otherwise, there is a confusion. Which is somehow not turned into an error. Which in itself it really, really weird.
I'll close this for now.
Thanks @jichenjc for the pointers.
ok, glad it works for you~
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
Installed the latest of the cloud-provider which includes the openstack controller (for LBaaS) and the cinder csi plugin. We can create and mount external volumes as long as the pod doesn't contain asecurityContext
. When we disable it, it works, with it enabled, it fails to ultimately mount the volume.See comment: https://github.com/kubernetes/cloud-provider-openstack/issues/1851#issuecomment-1123694877
What you expected to happen:
I can use the volume, the pod starts.
How to reproduce it:
I am using the hashicorp consul chart, but I have this with various charts that add asecurityContext
.Not sure, see comment: https://github.com/kubernetes/cloud-provider-openstack/issues/1851#issuecomment-1123694877
Anything else we need to know?:
My cinder-csi manifest:
I have some logs, hopefully these are helpful:
I can verify that the volume is attached to the instance.
Then, the next bit makes it fail:
From a pod perspective:
This is the pod definition (
helm template ...
):Environment:
Driver version: 1.3.0@v1.21.0
,CSI Spec version: 1.3.0
v1.23.5+k0s.0
)