kubernetes-sigs / alibaba-cloud-csi-driver

CSI Plugin for Kubernetes, Support Alibaba Cloud EBS/NAS/OSS/CPFS
Apache License 2.0
540 stars 241 forks source link

unable to attach disk to pod #959

Closed sateesh9in closed 9 months ago

sateesh9in commented 10 months ago

What happened:

Trying to integrate ABC CSI plugin to self managed Kubernetes cluster, disk is provisioned and attached to ECS instances, it is showing at os level using lsblk command. however the same disk is not attached to pod. https://www.alibabacloud.com/help/en/nas/user-guide/mount-a-nas-file-system-on-a-self-managed-kubernetes-cluster

What you expected to happen:

disk need to attach to pod successfully.

How to reproduce it (as minimally and precisely as possible):

new k8s rke2 cluster with ecs instances with ubuntu 22.04.3 https://www.alibabacloud.com/help/en/nas/user-guide/mount-a-nas-file-system-on-a-self-managed-kubernetes-cluster

Anything else we need to know?:

Environment:

csi-plugin logs:

dd and globDevices with [] " time="2024-01-25T14:50:58+08:00" level=info msg="GetVolumeDeviceName, Get Device Name by Config File d-l4v35dmp3ishqms8iczf, DeviceName: " time="2024-01-25T14:50:58+08:00" level=error msg="NodeStageVolume: Attach volume: d-l4v35dmp3ishqms8iczf with error: AttachDisk: disk device cannot be found in node, diskid: d-l4v35dmp3ishqms8iczf, devicenName: " E0125 14:50:58.475194 1172637 utils.go:101] GRPC error: rpc error: code = Aborted desc = NodeStageVolume: Attach volume: d-l4v35dmp3ishqms8iczf with error: AttachDisk: disk device cannot be found in node, diskid: d-l4v35dmp3ishqms8iczf, devicenName: time="2024-01-25T14:53:00+08:00" level=info msg="NodeStageVolume: Stage VolumeId: d-l4v35dmp3ishqms8iczf, Target Path: /var/lib/kubelet/plugins/kubernetes.io/csi/diskplugin.csi.alibabacloud.com/1f211609332a6811005604d84f7eb1cac97d337b47ee3119b4df6ea97756a290/globalmount, VolumeContext: map[storage.kubernetes.io/csiProvisionerIdentity:1706161276794-8145-diskplugin.csi.alibabacloud.com type:cloud_essd volume.kubernetes.io/storage-provisioner:diskplugin.csi.alibabacloud.com]" time="2024-01-25T14:53:00+08:00" level=info msg="AttachDisk: Starting Do AttachDisk: DiskId: d-l4v35dmp3ishqms8iczf, InstanceId: i-l4v3vvyeoczrqsgrd0pe, Region: me-central-1" time="2024-01-25T14:53:00+08:00" level=info msg="Get AK: use ENV AK" time="2024-01-25T14:53:00+08:00" level=warning msg="GetDevice: Get volume d-l4v35dmp3ishqms8iczf device by Serial, but validate error List Device Path empty for /dev/vdd and globDevices with [] " time="2024-01-25T14:53:00+08:00" level=info msg="GetVolumeDeviceName, Get Device Name by Config File d-l4v35dmp3ishqms8iczf, DeviceName: " time="2024-01-25T14:53:05+08:00" level=info msg="AttachDisk: find disk dev after 5 seconds" time="2024-01-25T14:53:05+08:00" level=warning msg="GetDevice: Get volume d-l4v35dmp3ishqms8iczf device by Serial, but validate error List Device Path empty for /dev/vdd and globDevices with [] " time="2024-01-25T14:53:05+08:00" level=info msg="GetVolumeDeviceName, Get Device Name by Config File d-l4v35dmp3ishqms8iczf, DeviceName: " time="2024-01-25T14:53:05+08:00" level=error msg="NodeStageVolume: Attach volume: d-l4v35dmp3ishqms8iczf with error: AttachDisk: disk device cannot be found in node, diskid: d-l4v35dmp3ishqms8iczf, devicenName: " E0125 14:53:05.618123 1172637 utils.go:101] GRPC error: rpc error: code = Aborted desc = NodeStageVolume: Attach volume: d-l4v35dmp3ishqms8iczf with error: AttachDisk: disk device cannot be found in node, diskid: d-l4v35dmp3ishqms8iczf, devicenName: time="2024-01-25T14:55:07+08:00" level=info msg="NodeStageVolume: Stage VolumeId: d-l4v35dmp3ishqms8iczf, Target Path: /var/lib/kubelet/plugins/kubernetes.io/csi/diskplugin.csi.alibabacloud.com/1f211609332a6811005604d84f7eb1cac97d337b47ee3119b4df6ea97756a290/globalmount, VolumeContext: map[storage.kubernetes.io/csiProvisionerIdentity:1706161276794-8145-diskplugin.csi.alibabacloud.com type:cloud_essd volume.kubernetes.io/storage-provisioner:diskplugin.csi.alibabacloud.com]" time="2024-01-25T14:55:07+08:00" level=info msg="AttachDisk: Starting Do AttachDisk: DiskId: d-l4v35dmp3ishqms8iczf, InstanceId: i-l4v3vvyeoczrqsgrd0pe, Region: me-central-1" time="2024-01-25T14:55:07+08:00" level=info msg="Get AK: use ENV AK" time="2024-01-25T14:55:07+08:00" level=warning msg="GetDevice: Get volume d-l4v35dmp3ishqms8iczf device by Serial, but validate error List Device Path empty for /dev/vdd and globDevices with [] " time="2024-01-25T14:55:07+08:00" level=info msg="GetVolumeDeviceName, Get Device Name by Config File d-l4v35dmp3ishqms8iczf, DeviceName: " time="2024-01-25T14:55:12+08:00" level=info msg="AttachDisk: find disk dev after 5 seconds" time="2024-01-25T14:55:12+08:00" level=warning msg="GetDevice: Get volume d-l4v35dmp3ishqms8iczf device by Serial, but validate error List Device Path empty for /dev/vdd and globDevices with [] " time="2024-01-25T14:55:12+08:00" level=info msg="GetVolumeDeviceName, Get Device Name by Config File d-l4v35dmp3ishqms8iczf, DeviceName: " time="2024-01-25T14:55:12+08:00" level=error msg="NodeStageVolume: Attach volume: d-l4v35dmp3ishqms8iczf with error: AttachDisk: disk device cannot be found in node, diskid: d-l4v35dmp3ishqms8iczf, devicenName: " E0125 14:55:12.695107 1172637 utils.go:101] GRPC error: rpc error: code = Aborted desc = NodeStageVolume: Attach volume: d-l4v35dmp3ishqms8iczf with error: AttachDisk: disk device cannot be found in node, diskid: d-l4v35dmp3ishqms8iczf, devicenName:

mowangdk commented 10 months ago

What is your ecs instance type? and what's the csi driver version? @sateesh9in

huww98 commented 10 months ago

@sateesh9in do you have /dev mounted into csi-plugin container? It seems CSI cannot see the /dev/vdd file.

https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/974b6fb06645e4e153cae033afae862c4749b570/deploy/ecs/csi-plugin.yaml#L158-L160

sateesh9in commented 10 months ago

hi huww98,

mountPath changed from /abc to /dev post that i am getting below error and csi-plugin container is crashed. i have followed below KB, kindly suggest if any changes are required. https://www.alibabacloud.com/help/en/nas/user-guide/mount-a-nas-file-system-on-a-self-managed-kubernetes-cluster

    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/kubelet/
      mountPropagation: Bidirectional
      name: kubelet-dir
    - mountPath: /host/etc
      name: etc
    - mountPath: /var/log/
      name: host-log
    - mountPath: /host/usr/
      name: ossconnectordir
    - mountPath: /var/lib/container
      mountPropagation: Bidirectional
      name: container-dir
    - mountPath: /dev
      mountPropagation: HostToContainer
      name: host-dev
      readOnly: true
    - mountPath: /host/var/run/
      name: fuse-metrics-dir

Normal Started 45s kubelet Started container nas-driver-registrar Normal Started 45s kubelet Started container oss-driver-registrar Warning Failed 45s kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/kubelet/pods/b1991118-0400-42e8-abf0-336a9d0789c1/containers/csi-plugin/02f85e4d" to rootfs at "/dev/termination-log": open /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/csi-plugin/rootfs/dev/termination-log: read-only file system: unknown Warning Failed 44s kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/kubelet/pods/b1991118-0400-42e8-abf0-336a9d0789c1/containers/csi-plugin/8a66454a" to rootfs at "/dev/termination-log": open /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/csi-plugin/rootfs/dev/termination-log: read-only file system: unknown Warning Failed 29s kubelet Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "/var/lib/kubelet/pods/b1991118-0400-42e8-abf0-336a9d0789c1/containers/csi-plugin/d193f917" to rootfs at "/dev/termination-log": open /run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/csi-plugin/rootfs/dev/termination-log: read-only file system: unknown Warning BackOff 15s (x5 over 44s) kubelet Back-off restarting failed container Normal Pulled 4s (x4 over 45s) kubelet Container image "registry-vpc.me-central-1.aliyuncs.com/acs/csi-plugin:v1.24.5-39a3970-aliyun" already present on machine Normal Created 4s (x4 over 45s) kubelet Created container csi-plugin

image

Thanks Sateesh

sateesh9in commented 10 months ago

Hi mowangdk,

I have implemented as per below KB. please check "Mount a NAS file system by using the CSI plug-in" section.

https://www.alibabacloud.com/help/en/nas/user-guide/mount-a-nas-file-system-on-a-self-managed-kubernetes-cluster

ECS instances type: ecs.g6.3xlarge CSI plugin version: csi-plugin:v1.24.5-39a3970-aliyun

Thanks & Regards sateesh

huww98 commented 10 months ago

So why do you set readOnly: true? I expect it to work if you remove that line.

huww98 commented 10 months ago

I have implemented as per below KB. please check "Mount a NAS file system by using the CSI plug-in" section. https://www.alibabacloud.com/help/en/nas/user-guide/mount-a-nas-file-system-on-a-self-managed-kubernetes-cluster

That KB is for NAS, while you are trying to attach EBS disk. Please refer to this yaml to setup your CSI plugin: https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/974b6fb06645e4e153cae033afae862c4749b570/deploy/ecs/csi-plugin.yaml#L158-L160

sateesh9in commented 10 months ago

hi huww98,

is it safe to remove readOnly: true line, if i follow and implement as per above KB for EBS disk.

Thanks Sateesh

huww98 commented 10 months ago

It should be pretty safe. We deploy CSI on official Alibaba Cloud Kubernetes Service (ACK) cluster without readOnly: true. However, we have not conducted extensive testing on Ubuntu yet. Please proceed with caution.

I would suggest you follow this doc: https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver/blob/master/docs/disk.md

sateesh9in commented 10 months ago

Hi huww98,

Thank you for the suggestions, i will follow above article to implement CSI plugin.

Thanks Sateesh

mowangdk commented 10 months ago

Hi mowangdk,

I have implemented as per below KB. please check "Mount a NAS file system by using the CSI plug-in" section.

https://www.alibabacloud.com/help/en/nas/user-guide/mount-a-nas-file-system-on-a-self-managed-kubernetes-cluster

ECS instances type: ecs.g6.3xlarge CSI plugin version: csi-plugin:v1.24.5-39a3970-aliyun

Thanks & Regards sateesh

the document your mentioned is not managed by us, we will connect the person in charge to fix the issue.

sateesh9in commented 9 months ago

Thank you support