rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.05k stars 2.66k forks source link

failed to mount PVC to pod using cephfs-csi #2987

Closed Madhu-1 closed 5 years ago

Madhu-1 commented 5 years ago

Is this a bug report or feature request?

Deviation from expected behavior: Expected behavior: Mounting of PVC to pod should be successful

How to reproduce it (minimal and precise):

Environment:

logs from kubectl

[root@worker2 vagrant]# journalctl -xe|grep mypvc
Apr 15 06:16:37 worker2 kubelet[3835]: E0415 06:16:37.962016    3835 kubelet.go:1680] Unable to mount volumes for pod "csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)": timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]; skipping pod
Apr 15 06:16:37 worker2 kubelet[3835]: E0415 06:16:37.962103    3835 pod_workers.go:190] Error syncing pod 6d2a6439-5f45-11e9-be2b-5254007e23ff ("csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]
Apr 15 06:18:53 worker2 kubelet[3835]: E0415 06:18:53.966157    3835 kubelet.go:1680] Unable to mount volumes for pod "csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)": timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]; skipping pod
Apr 15 06:18:53 worker2 kubelet[3835]: E0415 06:18:53.966196    3835 pod_workers.go:190] Error syncing pod 6d2a6439-5f45-11e9-be2b-5254007e23ff ("csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]
Apr 15 06:21:08 worker2 kubelet[3835]: E0415 06:21:08.981803    3835 kubelet.go:1680] Unable to mount volumes for pod "csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)": timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]; skipping pod
Apr 15 06:21:08 worker2 kubelet[3835]: E0415 06:21:08.981844    3835 pod_workers.go:190] Error syncing pod 6d2a6439-5f45-11e9-be2b-5254007e23ff ("csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]
Apr 15 06:23:24 worker2 kubelet[3835]: E0415 06:23:24.966423    3835 kubelet.go:1680] Unable to mount volumes for pod "csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)": timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]; skipping pod
Apr 15 06:23:24 worker2 kubelet[3835]: E0415 06:23:24.966443    3835 pod_workers.go:190] Error syncing pod 6d2a6439-5f45-11e9-be2b-5254007e23ff ("csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]
Apr 15 06:25:41 worker2 kubelet[3835]: E0415 06:25:41.975890    3835 kubelet.go:1680] Unable to mount volumes for pod "csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)": timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]; skipping pod
Apr 15 06:25:41 worker2 kubelet[3835]: E0415 06:25:41.975933    3835 pod_workers.go:190] Error syncing pod 6d2a6439-5f45-11e9-be2b-5254007e23ff ("csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]
Apr 15 06:27:58 worker2 kubelet[3835]: E0415 06:27:58.964396    3835 kubelet.go:1680] Unable to mount volumes for pod "csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)": timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]; skipping pod
Apr 15 06:27:58 worker2 kubelet[3835]: E0415 06:27:58.964484    3835 pod_workers.go:190] Error syncing pod 6d2a6439-5f45-11e9-be2b-5254007e23ff ("csicephfs-demo-pod_default(6d2a6439-5f45-11e9-be2b-5254007e23ff)"), skipping: timeout expired waiting for volumes to attach or mount for pod "default"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-p8slv]

@rootfs @travisn

Madhu-1 commented 5 years ago

I remember @rootfs told we don't need attacher pod for cephfs-csi https://rook-io.slack.com/archives/CG3HUV94J/p1551360632001300

rogaha commented 5 years ago

same here:

(⎈ |ucp_dci-rjkaippa-ucp-46...:rook-ceph) ~/d/r/r/c/e/k/ceph$ kdp csicephfs-demo-pod               15:59:20 ⎇ master
Name:               csicephfs-demo-pod
Namespace:          rook-ceph
Priority:           0
PriorityClassName:  <none>
Node:               ip-172-31-8-23.us-west-2.compute.internal/172.31.8.23
Start Time:         Thu, 18 Apr 2019 15:58:58 -0700
Labels:             <none>
Annotations:        kubernetes.io/psp: privileged
Status:             Pending
IP:
Containers:
  web-server:
    Container ID:
    Image:          nginx
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/www/html from mypvc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-tdt56 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  mypvc:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  cephfs-pvc
    ReadOnly:   false
  default-token-tdt56:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-tdt56
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     com.docker.ucp.manager
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason              Age   From                     Message
  ----     ------              ----  ----                     -------
  Normal   Scheduled           31s   default-scheduler        Successfully assigned rook-ceph/csicephfs-demo-pod to ip-172-31-8-23.us-west-2.compute.internal
  Warning  FailedAttachVolume  8s    attachdetach-controller  AttachVolume.Attach failed for volume "pvc-10fd0623-622d-11e9-9fde-0242ac11000b" : attachment timeout for volume csi-cephfs-pvc-10fd0623-622d-11e9-9fde-0242ac11000b

(⎈ |ucp_dci-rjkaippa-ucp-46...:rook-ceph) ~/d/r/r/c/e/k/ceph$ k describe pvc/cephfs-pvc            16:01:33 ⎇ master
Name:          cephfs-pvc
Namespace:     rook-ceph
StorageClass:  csi-cephfs
Status:        Bound
Volume:        pvc-10fd0623-622d-11e9-9fde-0242ac11000b
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: cephfs.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Events:
  Type       Reason                 Age                  From                                                                                     Message
  ----       ------                 ----                 ----                                                                                     -------
  Normal     Provisioning           6m6s                 cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-0_f0bc4d4f-6226-11e9-9318-aec38ceb6b74  External provisioner is provisioning volume for claim "rook-ceph/cephfs-pvc"
  Normal     ExternalProvisioning   6m5s (x2 over 6m6s)  persistentvolume-controller                                                              waiting for a volume to be created, either by external provisioner "cephfs.csi.ceph.com" or manually created by system administrator
  Normal     ProvisioningSucceeded  6m5s                 cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-0_f0bc4d4f-6226-11e9-9318-aec38ceb6b74  Successfully provisioned volume pvc-10fd0623-622d-11e9-9fde-0242ac11000b
Mounted By:  csicephfs-demo-pod
rogaha commented 5 years ago
Screenshot of Google Chrome (4-18-19, 4-07-48 PM)
rogaha commented 5 years ago

problem solved -- the attacher was missing in my cluster. :)

JohnStrunk commented 5 years ago

I've also encountered this bug. When installing the CSI drivers via the Rook docs: https://rook.io/docs/rook/master/ceph-csi-drivers.html, there is no attacher running for cephfs.

For RBD, it's in the provisioner pod, but for cephfs, it's missing.

This leads to:

$ oc describe po/centos-cephfs
...
Events:
  Type     Reason              Age                  From                                   Message
  ----     ------              ----                 ----                                   -------
  Warning  FailedScheduling    33m (x3 over 33m)    default-scheduler                      pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
  Normal   Scheduled           33m                  default-scheduler                      Successfully assigned default/centos-cephfs to ip-10-0-132-131.ec2.internal
  Warning  FailedMount         2m7s (x14 over 31m)  kubelet, ip-10-0-132-131.ec2.internal  Unable to mount volumes for pod "centos-cephfs_default(7c9c3fc0-70f5-11e9-a09c-0ecba6260bb8)": timeout expired waiting for volumes to attach or mount for pod "default"/"centos-cephfs". list of unmounted volumes=[data]. list of unattached volumes=[data default-token-4btvz]
  Warning  FailedAttachVolume  109s (x21 over 33m)  attachdetach-controller                AttachVolume.Attach failed for volume "pvc-7c921d6c-70f5-11e9-a09c-0ecba6260bb8" : attachment timeout for volume csi-cephfs-pvc-7c921d6c-70f5-11e9-a09c-0ecba6260bb8

And there are no logs in any of the cephfs CSI containers that indicate ongoing requests.

ShyamsundarR commented 5 years ago

@Madhu-1 attacher side car needs to be running. As discussed here we need the attacher running alongside the controller service pods (IOW the provisioner pod) for kubernetes attach to progress beyond the controller to reach the node service.

Madhu-1 commented 5 years ago

i will send a patch to fix this one

mr00wka commented 5 years ago

Also tested against k8s version: GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290"

Pods using cephfs volume claims still do fail trying to attach the volume:

Type     Reason              Age                  From                     Message
  ----     ------              ----                 ----                     -------
  Normal   Scheduled           2m54s                default-scheduler        Successfully assigned rook-ceph/csicephfs-no-attacher to worker01
  Warning  FailedMount         51s                  kubelet, worker01        Unable to mount volumes for pod "csicephfs-no-attacher_rook-ceph(49c8ac3d-7162-11e9-a4ed-0050560163f8)": timeout expired waiting for volumes to attach or mount for pod "rook-ceph"/"csicephfs-no-attacher". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-m9z2c]
  Warning  FailedAttachVolume  38s (x7 over 2m39s)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-e85399b0-7161-11e9-a4ed-0050560163f8" : attachment timeout for volume csi-cephfs-pvc-e85399b0-7161-11e9-a4ed-0050560163f8

Additionally to rook's documentation, applying attacher RBAC and POD as described in ceph/ceph-csi does fix that problem: https://github.com/ceph/ceph-csi/blob/csi-v1.0/docs/deploy-cephfs.md

kubectl create -f csi-attacher-rbac.yaml kubectl create -f csi-cephfsplugin-attacher.yaml

nehaberry commented 5 years ago

@Madhu-1 "csi-attacher-rbac.yaml" and "csi-cephfsplugin-attacher.yaml" are still not part of rook repo (https://github.com/rook/rook/tree/master/cluster/examples/kubernetes/ceph/csi/rbac/cephfs)

Current content of rook/cluster/examples/kubernetes/ceph/csi/rbac/cephfs are only these 2 rbac files : csi-nodeplugin-rbac.yaml
csi-provisioner-rbac.yaml

Hence, for downstream OCP, if we want to use rook-cephfs=CSI, what is your suggestion ?

Madhu-1 commented 5 years ago

attacher will be deployed as a container in provisioner statefulset. @nehaberry the above mentioned files is not avaiable anymore.

nehaberry commented 5 years ago

@Madhu-1 also, the process for deployment in kubernetes and downstream OCP differ.

In downstream OCP, we are enabling CSI using only these 3 steps:

  1. Run common.yaml --> $oc create -f common.yaml
    1. Create RBAC used by CSI drivers --> $oc apply -f ./csi/rbac/rbd/ ; oc apply -f ./csi/rbac/cephfs/
    2. Run operator-openshift-with-csi.yaml --> $oc create -f operator-openshift-with-csi.yaml
    3. Run cluster.yaml to bring up ceph pods. --> $oc create -f cluster.yaml

But here we have a section with common rbac's for rbd and cephfs and then 2 separate rbac's for CSI plugins. Hence, it would be better to have similar approach for OCP as well.


Deploy RBACs for sidecar containers and node plugins:

kubectl create -f csi-attacher-rbac.yaml
kubectl create -f csi-provisioner-rbac.yaml
kubectl create -f csi-nodeplugin-rbac.yaml
Those manifests deploy service accounts, cluster roles and cluster role bindings. These are shared for both RBD and CephFS CSI plugins, as they require the same permissions.

Deploy CSI sidecar containers:

kubectl create -f csi-cephfsplugin-attacher.yaml
kubectl create -f csi-cephfsplugin-provisioner.yaml
nehaberry commented 5 years ago

attacher will be deployed as a container in provisioner statefulset. @nehaberry the above mentioned files is not avaiable anymore.

@Madhu-1 in that case, if I use the current rook repo and bring up rook-ceph-csi using the steps mentioned above, my cephfs pvcs are not getting bound to the pod.

Aren't we supposed to use operator-openshift-with-csi.yaml and the rbac files from rook repo ?

Madhu-1 commented 5 years ago

@Madhu-1 also, the process for deployment in kubernetes and downstream OCP differ.

In downstream OCP, we are enabling CSI using only these 3 steps:

  1. Run common.yaml --> $oc create -f common.yaml
  2. Create RBAC used by CSI drivers --> $oc apply -f ./csi/rbac/rbd/ ; oc apply -f ./csi/rbac/cephfs/
  3. Run operator-openshift-with-csi.yaml --> $oc create -f operator-openshift-with-csi.yaml
  4. Run cluster.yaml to bring up ceph pods. --> $oc create -f cluster.yaml

i havent tested operator-openshift-with-csi.yaml

But here we have a section with common rbac's for rbd and cephfs and then 2 separate rbac's for CSI plugins. Hence, it would be better to have similar approach for OCP as well.


Deploy RBACs for sidecar containers and node plugins:

kubectl create -f csi-attacher-rbac.yaml
kubectl create -f csi-provisioner-rbac.yaml
kubectl create -f csi-nodeplugin-rbac.yaml
Those manifests deploy service accounts, cluster roles and cluster role bindings. These are shared for both RBD and CephFS CSI plugins, as they require the same permissions.

if you using rook you don't need to follow docs in ceph csi repo
Deploy CSI sidecar containers:

kubectl create -f csi-cephfsplugin-attacher.yaml
kubectl create -f csi-cephfsplugin-provisioner.yaml

if you using rook you don't need to follow docs in ceph csi repo

Madhu-1 commented 5 years ago

if I use the current rook repo and bring up rook-ceph-csi using the steps mentioned above, my cephfs pvcs are not getting bound to the pod.

what issue you are facing? can I get the logs?

nehaberry commented 5 years ago

@Madhu-1 the issue is the same as the bug raised. Due to absence of attacher pod

Error from oc describe

Events: Type Reason Age From Message


Normal Scheduled 51m default-scheduler Successfully assigned ceph-test/csicephfs-demo-pod to ip-10-0-160-26.us-east-2.compute.internal Warning FailedMount 107s (x22 over 49m) kubelet, ip-10-0-160-26.us-east-2.compute.internal Unable to mount volumes for pod "csicephfs-demo-pod_ceph-test(5384b7c0-8ce7-11e9-86f0-02019b8c8146)": timeout expired waiting for volumes to attach or mount for pod "ceph-test"/"csicephfs-demo-pod". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-rmz2d] Warning FailedAttachVolume 77s (x29 over 51m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-479ef49c-8ce7-11e9-86f0-02019b8c8146" : attachment timeout for volume csi-cephfs-pvc-479ef49c-8ce7-11e9-86f0-02019b8c8146 (ocs-ci) [nberry@localhost cephfs]$

Rook_csi Pod list


oc get pods -o wide -n openshift-storage
NAME                                            READY   STATUS      RESTARTS   AGE    IP             NODE                                         NOMINATED NODE   READINESS GATES
csi-cephfsplugin-97k8b                          2/2     Running     1          135m   10.0.160.26    ip-10-0-160-26.us-east-2.compute.internal    <none>           <none>
csi-cephfsplugin-dwhw6                          2/2     Running     0          135m   10.0.140.250   ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-0                  3/3     Running     0          135m   10.129.2.15    ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-ssskj                          2/2     Running     1          135m   10.0.155.135   ip-10-0-155-135.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-5jlvk                             2/2     Running     0          135m   10.0.140.250   ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-gnlmk                             2/2     Running     1          135m   10.0.155.135   ip-10-0-155-135.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-provisioner-0                     4/4     Running     0          135m   10.129.2.14    ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-x9r4b                             2/2     Running     1          135m   10.0.160.26    ip-10-0-160-26.us-east-2.compute.internal    <none>           <none>
rook-ceph-agent-42mlb                           1/1     Running     0          135m   10.0.140.250   ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
rook-ceph-agent-kfrrk                           1/1     Running     0          135m   10.0.155.135   ip-10-0-155-135.us-east-2.compute.internal   <none>           <none>
rook-ceph-agent-tm769                           1/1     Running     0          135m   10.0.160.26    ip-10-0-160-26.us-east-2.compute.internal    <none>           <none>
rook-ceph-mds-ocsci-cephfs-a-85757d4b94-scvrw   1/1     Running     0          128m   10.128.2.14    ip-10-0-155-135.us-east-2.compute.internal   <none>           <none>
rook-ceph-mds-ocsci-cephfs-b-6f4dcbb55d-w5zcw   1/1     Running     0          128m   10.129.2.21    ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
rook-ceph-mgr-a-7574b89cd5-7qpgp                1/1     Running     0          131m   10.129.2.18    ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-a-65f896d74-w7pcz                 1/1     Running     0          133m   10.129.2.17    ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-b-5698f648c6-jqd96                1/1     Running     0          132m   10.128.2.11    ip-10-0-155-135.us-east-2.compute.internal   <none>           <none>
rook-ceph-mon-c-674b986ccc-9pssx                1/1     Running     0          131m   10.131.0.13    ip-10-0-160-26.us-east-2.compute.internal    <none>           <none>
rook-ceph-operator-5b6856f864-mhl7s             1/1     Running     0          136m   10.129.2.12    ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-0-569585457f-c64v2                1/1     Running     0          129m   10.131.0.15    ip-10-0-160-26.us-east-2.compute.internal    <none>           <none>
rook-ceph-osd-1-68fd5d45cc-fddpf                1/1     Running     0          129m   10.129.2.20    ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-2-54ccd58874-j9kn8                1/1     Running     0          129m   10.128.2.13    ip-10-0-155-135.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ip-10-0-140-250-mn4h9     0/2     Completed   0          130m   10.129.2.19    ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ip-10-0-155-135-6lxnr     0/2     Completed   0          130m   10.128.2.12    ip-10-0-155-135.us-east-2.compute.internal   <none>           <none>
rook-ceph-osd-prepare-ip-10-0-160-26-gtmhq      0/2     Completed   0          130m   10.131.0.14    ip-10-0-160-26.us-east-2.compute.internal    <none>           <none>
rook-ceph-tools-76c7d559b6-hr649                1/1     Running     0          129m   10.0.160.26    ip-10-0-160-26.us-east-2.compute.internal    <none>           <none>
rook-discover-bwqc7                             1/1     Running     0          135m   10.131.0.11    ip-10-0-160-26.us-east-2.compute.internal    <none>           <none>
rook-discover-dgrx4                             1/1     Running     0          135m   10.128.2.10    ip-10-0-155-135.us-east-2.compute.internal   <none>           <none>
rook-discover-zknqs                             1/1     Running     0          135m   10.129.2.13    ip-10-0-140-250.us-east-2.compute.internal   <none>           <none>
Madhu-1 commented 5 years ago

@nehaberry check attacher container logs, you will come to know what the issue.

pbkh-kimheang commented 1 year ago

@Madhu-1 can I have a question here? Could anyone help me? I cannot mount successfully.

Events:
  Type     Reason       Age                    From                           Message
  ----     ------       ----                   ----                           -------
  Warning  FailedMount  11m (x26 over 169m)    kubelet, sprl-pbkh-kubenode03  Unable to attach or mount volumes: unmounted volumes=[cephfs-pvc], unattached volumes=[default-token-bms74 cephfs-pvc]: timed out waiting for the condition
  Warning  FailedMount  6m53s (x47 over 163m)  kubelet, sprl-pbkh-kubenode03  Unable to attach or mount volumes: unmounted volumes=[cephfs-pvc], unattached volumes=[cephfs-pvc default-token-bms74]: timed out waiting for the condition
  Warning  FailedMount  58s (x92 over 172m)    kubelet, sprl-pbkh-kubenode03  MountVolume.MountDevice failed for volume "pvc-c266c4e3-9ea2-4b26-9759-b73a5ba3516a" : rpc error: code = Internal desc = an error (exit status 1) occurred while running nsenter args: [--net=/ -- ceph-fuse /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-c266c4e3-9ea2-4b26-9759-b73a5ba3516a/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-83e27006-59a6-11ed-97f7-7e2180fc1e5e/66900fdf-648b-49ba-ac19-cf3f32cb874e -o nonempty --client_mds_namespace=cephfs] stderr: nsenter: reassociate to namespace 'ns/net' failed: Invalid argument
Madhu-1 commented 1 year ago

@pbkh-kimheang are you using csi pod networking and fuse client?

Please share the rook version you are using and also the below output

pbkh-kimheang commented 1 year ago

@Madhu-1. Sir. I am not sure it is my first time working on it.

are you using csi pod networking and fuse client?

https://github.com/ceph/ceph-csi/tree/devel/examples#running-cephcsi-with-pod-networking

Here is the result output, I don't think my kubernets server using rook. image

Sir, I have other question, from the example. It seems I need to deploy many think such as Driver, Plugin. Pod security policy.... My purpose just want to create volume on K8s and mount to ceph. But It seems there are many things I need to prepare. For the config setting, I also don't understand well (I have not much knowledge of CEPH) https://github.com/ceph/ceph-csi/blob/devel/examples/csi-config-map-sample.yaml#L63-L66

pbkh-kimheang commented 1 year ago

@Madhu-1 How I test is: I download and deploy it as in the example. I did not do anything related to pod networking. I just deploy it. Do I need to do other things? I am sorry.

pbkh-kimheang commented 1 year ago

image

pbkh-kimheang commented 1 year ago

What is wrong, Sir? image

pbkh-kimheang commented 1 year ago

@Madhu-1 when Demonset on Kubenode 1 failed, The deployments created on kubenode 1 cannot finding cephfsplugin to mount and it status is pending. But when deployment is created on node 3 (demonset on node3 is running fine). it status is wating and the log is: https://github.com/rook/rook/issues/2987#issuecomment-1298206301

pbkh-kimheang commented 1 year ago

@Madhu-1 do I need to install ceph-fuse on Kubernetes server?

Madhu-1 commented 1 year ago

please remove netNamespaceFilePath https://github.com/ceph/ceph-csi/blob/devel/examples/csi-config-map-sample.yaml#L65 entry from configmap and try. please open issue in https://github.com/ceph/ceph-csi repo as you are not using Rook.

pbkh-kimheang commented 1 year ago

@Madhu-1 . Thank you, sir. After I remove netNamespaceFilePath, the deployment working fine and it is the log: image But I don't think it mount to cephfs successfully because I don't see anything in mount directory /var/lib/www/html of this deployment https://github.com/ceph/ceph-csi/blob/devel/examples/cephfs/deployment.yaml#L23

Actually I have create many folder and file on CEPH FS that I mount using ceph-fuse on my ubuntu

Madhu-1 commented 1 year ago

But I don't think it mount to cephfs successfully because I don't see anything in mount directory /var/lib/www/html of this deployment

If the pod is running means the subvolume is mounted, you can exec in to the pod and run df -h and see ceph is mounted to /var/lib/www/html. its a fresh subvolume when you created a PVC, not sure what you expect to see in the mount directory.

pbkh-kimheang commented 1 year ago

@Madhu-1 Thank you, Sir. It works fine now after I test. Thank you so much for your support!

pbkh-kimheang commented 1 year ago

What is wrong, Sir? image

@Madhu-1 Sir. I am sorry again for your inconvenience. On the above I test on Minikube it work fine (stand alone) But in real k8s (Demonset) I have 3 node.

I tried to create deployment and assign to Node 3 but I cannot mount success. Here is the log:

Events:
  Type     Reason       Age        From                           Message
  ----     ------       ----       ----                           -------
  Normal   Scheduled    <unknown>  default-scheduler              Successfully assigned default/php-server-8466d56997-mwpxz to sprl-pbkh-kubenode03
  Warning  FailedMount  13m        kubelet, sprl-pbkh-kubenode03  MountVolume.MountDevice failed for volume "pvc-24923c07-ccdf-4e83-8ad2-f02332c11881" : rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph-fuse args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-4621e726-5a66-11ed-874c-16d615299a53/fa0296e2-c532-45c7-9fe0-59f755aeeab9 -o nonempty --client_mds_namespace=cephfs] stderr: 2022-11-02T10:57:50.223+0000 7f08ca90b3c0 -1 init, newargv = 0x5559d758fc50 newargc=17
ceph-fuse[29838]: starting ceph client
ceph-fuse[29838]: fuse failed to start
2022-11-02T10:57:50.235+0000 7f08ca90b3c0 -1 fuse_ll: already_fuse_mounted: statx(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount) failed with error (22) Invalid argument
  Warning  FailedMount  13m  kubelet, sprl-pbkh-kubenode03  MountVolume.MountDevice failed for volume "pvc-24923c07-ccdf-4e83-8ad2-f02332c11881" : rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph-fuse args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-4621e726-5a66-11ed-874c-16d615299a53/fa0296e2-c532-45c7-9fe0-59f755aeeab9 -o nonempty --client_mds_namespace=cephfs] stderr: 2022-11-02T10:57:51.851+0000 7f5071ca33c0 -1 init, newargv = 0x55ec00585c50 newargc=17
ceph-fuse[29876]: starting ceph client
ceph-fuse[29876]: fuse failed to start
2022-11-02T10:57:51.863+0000 7f5071ca33c0 -1 fuse_ll: already_fuse_mounted: statx(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount) failed with error (22) Invalid argument
  Warning  FailedMount  13m  kubelet, sprl-pbkh-kubenode03  MountVolume.MountDevice failed for volume "pvc-24923c07-ccdf-4e83-8ad2-f02332c11881" : rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph-fuse args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-4621e726-5a66-11ed-874c-16d615299a53/fa0296e2-c532-45c7-9fe0-59f755aeeab9 -o nonempty --client_mds_namespace=cephfs] stderr: 2022-11-02T10:57:53.967+0000 7f1bf01b83c0 -1 init, newargv = 0x556f9339fc50 newargc=17
ceph-fuse[29944]: starting ceph client
ceph-fuse[29944]: fuse failed to start
2022-11-02T10:57:53.979+0000 7f1bf01b83c0 -1 fuse_ll: already_fuse_mounted: statx(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount) failed with error (22) Invalid argument
  Warning  FailedMount  13m  kubelet, sprl-pbkh-kubenode03  MountVolume.MountDevice failed for volume "pvc-24923c07-ccdf-4e83-8ad2-f02332c11881" : rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph-fuse args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-4621e726-5a66-11ed-874c-16d615299a53/fa0296e2-c532-45c7-9fe0-59f755aeeab9 -o nonempty --client_mds_namespace=cephfs] stderr: 2022-11-02T10:57:57.071+0000 7f7a845053c0 -1 init, newargv = 0x5637e279ac50 newargc=17
ceph-fuse[30081]: starting ceph client
ceph-fuse[30081]: fuse failed to start
2022-11-02T10:57:57.083+0000 7f7a845053c0 -1 fuse_ll: already_fuse_mounted: statx(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount) failed with error (22) Invalid argument
  Warning  FailedMount  13m  kubelet, sprl-pbkh-kubenode03  MountVolume.MountDevice failed for volume "pvc-24923c07-ccdf-4e83-8ad2-f02332c11881" : rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph-fuse args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-4621e726-5a66-11ed-874c-16d615299a53/fa0296e2-c532-45c7-9fe0-59f755aeeab9 -o nonempty --client_mds_namespace=cephfs] stderr: 2022-11-02T10:58:02.207+0000 7f13407503c0 -1 init, newargv = 0x559261507c50 newargc=17
ceph-fuse[30193]: starting ceph client
ceph-fuse[30193]: fuse failed to start
2022-11-02T10:58:02.219+0000 7f13407503c0 -1 fuse_ll: already_fuse_mounted: statx(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount) failed with error (22) Invalid argument
  Warning  FailedMount  13m  kubelet, sprl-pbkh-kubenode03  MountVolume.MountDevice failed for volume "pvc-24923c07-ccdf-4e83-8ad2-f02332c11881" : rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph-fuse args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-4621e726-5a66-11ed-874c-16d615299a53/fa0296e2-c532-45c7-9fe0-59f755aeeab9 -o nonempty --client_mds_namespace=cephfs] stderr: 2022-11-02T10:58:11.459+0000 7f6915a933c0 -1 init, newargv = 0x562056291c50 newargc=17
ceph-fuse[30361]: starting ceph client
ceph-fuse[30361]: fuse failed to start
2022-11-02T10:58:11.471+0000 7f6915a933c0 -1 fuse_ll: already_fuse_mounted: statx(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount) failed with error (22) Invalid argument
  Warning  FailedMount  13m  kubelet, sprl-pbkh-kubenode03  MountVolume.MountDevice failed for volume "pvc-24923c07-ccdf-4e83-8ad2-f02332c11881" : rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph-fuse args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-4621e726-5a66-11ed-874c-16d615299a53/fa0296e2-c532-45c7-9fe0-59f755aeeab9 -o nonempty --client_mds_namespace=cephfs] stderr: 2022-11-02T10:58:28.551+0000 7fb86b1923c0 -1 init, newargv = 0x563cdb14dc50 newargc=17
ceph-fuse[30640]: starting ceph client
ceph-fuse[30640]: fuse failed to start
2022-11-02T10:58:28.567+0000 7fb86b1923c0 -1 fuse_ll: already_fuse_mounted: statx(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount) failed with error (22) Invalid argument
  Warning  FailedMount  12m  kubelet, sprl-pbkh-kubenode03  MountVolume.MountDevice failed for volume "pvc-24923c07-ccdf-4e83-8ad2-f02332c11881" : rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph-fuse args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-4621e726-5a66-11ed-874c-16d615299a53/fa0296e2-c532-45c7-9fe0-59f755aeeab9 -o nonempty --client_mds_namespace=cephfs] stderr: 2022-11-02T10:59:01.743+0000 7eff36c853c0 -1 init, newargv = 0x55f8b5a09c50 newargc=17
ceph-fuse[31109]: starting ceph client
ceph-fuse[31109]: fuse failed to start
2022-11-02T10:59:01.755+0000 7eff36c853c0 -1 fuse_ll: already_fuse_mounted: statx(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount) failed with error (22) Invalid argument
  Warning  FailedMount  2m48s              kubelet, sprl-pbkh-kubenode03  Unable to attach or mount volumes: unmounted volumes=[cephfs-pvc], unattached volumes=[default-token-bms74 cephfs-pvc]: timed out waiting for the condition
  Warning  FailedMount  83s (x6 over 11m)  kubelet, sprl-pbkh-kubenode03  (combined from similar events): MountVolume.MountDevice failed for volume "pvc-24923c07-ccdf-4e83-8ad2-f02332c11881" : rpc error: code = Internal desc = an error (exit status 22) occurred while running ceph-fuse args: [/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-4621e726-5a66-11ed-874c-16d615299a53/fa0296e2-c532-45c7-9fe0-59f755aeeab9 -o nonempty --client_mds_namespace=cephfs] stderr: 2022-11-02T11:10:22.947+0000 7fb4a79853c0 -1 init, newargv = 0x56155567ac50 newargc=17
ceph-fuse[9002]: starting ceph client
ceph-fuse[2022-11-02T11:10:22.963+0000 7fb4a79853c0 -1 fuse_ll: already_fuse_mounted: statx(/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-24923c07-ccdf-4e83-8ad2-f02332c11881/globalmount) failed with error (22) Invalid argument
9002]: fuse failed to start
  Warning  FailedMount  32s (x4 over 11m)  kubelet, sprl-pbkh-kubenode03  Unable to attach or mount volumes: unmounted volumes=[cephfs-pvc], unattached volumes=[cephfs-pvc default-token-bms74]: timed out waiting for the condition

Could you help please?

Madhu-1 commented 1 year ago

Looks like you dont have the cephfs kernel client; please have it installed. The fuse client is not production ready for now, we have some issues with it.can you Please open an issue in cephcsi repo with more details?

pbkh-kimheang commented 1 year ago

@Madhu-1 Excuse me, Sir. I am not sure cephfs kernel client means installing ceph-fuse?

pbkh-kimheang commented 1 year ago

@Madhu-1 Sir. Sorry, on the server (stand-alone) that I use minikube, I did not additionally install ceph-common and ceph-fuse but it works fine. Is it related to the Kubernetes version?

pbkh-kimheang commented 1 year ago

Sir. on this driver: https://github.com/ceph/ceph-csi/blob/devel/deploy/cephfs/kubernetes/csidriver.yaml#L11 fsGroupPolicy does not known on k8s v1.18 so I remove it. Can it be the problem?

pbkh-kimheang commented 1 year ago

container (4).log Here is the log of Driver registrar on node 1.

pbkh-kimheang commented 1 year ago
I1103 04:16:10.268711   25783 main.go:166] Version: v2.5.1 
I1103 04:16:10.268749   25783 main.go:167] Running node-driver-registrar in mode=registration 
I1103 04:16:10.271190   25783 main.go:191] Attempting to open a gRPC connection with: "/csi/csi.sock" 
I1103 04:16:10.272353   25783 main.go:198] Calling CSI driver to discover driver name 
I1103 04:16:10.281078   25783 node_register.go:53] Starting Registration Server at: /registration/cephfs.csi.ceph.com-reg.sock 
I1103 04:16:10.281299   25783 node_register.go:62] Registration Server started at: /registration/cephfs.csi.ceph.com-reg.sock 
I1103 04:16:10.281547   25783 node_register.go:92] Skipping HTTP server because endpoint is set to: "" 
I1103 04:16:48.350391   25783 main.go:102] Received GetInfo call: &InfoRequest{} 
I1103 04:16:48.351794   25783 main.go:109] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/cephfs.csi.ceph.com/registration" 
I1103 04:18:01.075543   25783 main.go:120] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: error updating CSINode object with CSI driver node info: error updating CSINode: timed out waiting for the condition; caused by: an error on the server ("") has prevented the request from succeeding (get csinodes.storage.k8s.io sprl-pbkh-kubenode01),} 
E1103 04:18:01.076815   25783 main.go:122] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: error updating CSINode object with CSI driver node info: error updating CSINode: timed out waiting for the condition; caused by: an error on the server ("") has prevented the request from succeeding (get csinodes.storage.k8s.io sprl-pbkh-kubenode01), restarting registration container. 
pbkh-kimheang commented 1 year ago

@Madhu-1 Sir. Please help me. I really need it for my implementation. But I cannot make it work.

Madhu-1 commented 1 year ago

on this driver: https://github.com/ceph/ceph-csi/blob/devel/deploy/cephfs/kubernetes/csidriver.yaml#L11 fsGroupPolicy does not known on k8s v1.18 so I remove it. Can it be the problem?

If it's not supported you can remove it

Sorry, on the server (stand-alone) that I use minikube, I did not additionally install ceph-common and ceph-fuse but it works fine. Is it related to the Kubernetes version?

Minikube comes with cephfs kernel client in the ISO image, can you please tell me where you are deploying kubernetes cluster and what is the base OS on the machine?

@pbkh-kimheang please open a new issue here https://github.com/ceph/ceph-csi/issues

pbkh-kimheang commented 1 year ago

@Madhu-1 Thank you, Sir. I will create the issue soon. I get it so Minikube is the full set. The base OS is Ubuntu 16.04 sir.

pbkh-kimheang commented 1 year ago

@Madhu-1 I have created the issue, sir. I don't know how to write it well. Please check it. https://github.com/ceph/ceph-csi/issues/3493