rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.31k stars 2.69k forks source link

Not able to drain the node due to emtydir #13168

Closed vadlakiran closed 10 months ago

vadlakiran commented 11 months ago

we have a master and 3 worker nodes running on k8s cluster and in that we have deployed the rook-ceph and we are using 3 worker nodes disk i.e 3 osd's from 3 worker nodes,

as part of the osd's or worker node replacement we are trying to cordon and drain the worker node(before drain we have removed the osd from ceph cluster followed the doc).

while draining the worker node we are getting below issue.

`~$ sudo kubectl drain worker2 --ignore-daemonsets node/worker2 already cordoned error: unable to drain node "worker2", aborting command...

There are pending nodes to be drained: worker2 error: cannot delete Pods with local storage (use --delete-emptydir-data to override): rook-ceph/csi-cephfsplugin-provisioner-5f459cfb94-g6ccv, rook-ceph/csi-rbdplugin-provisioner-84c7fb8d76-d4d4b, rook-ceph/rook-ceph-mds-myfs-b-6cc4cfff96-lwlpw, rook-ceph/rook-ceph-mgr-a-7d58c74554-r5pz7, rook-ceph/rook-ceph-tools-79957dbdb7-rtfww vmauser@master1:~$`

Is this a bug report or feature request?

Deviation from expected behavior:

Expected behavior:

How to reproduce it (minimal and precise):

File(s) to submit:

Logs to submit:

Cluster Status to submit:

` ceph status cluster: id: 00770d4b-045a-4102-bd24-6855c936ad7d health: HEALTH_WARN Degraded data redundancy: 119492/358476 objects degraded (33.333%), 64 pgs degraded, 65 pgs undersized 24 pgs not deep-scrubbed in time OSD count 2 < osd_pool_default_size 3

services: mon: 3 daemons, quorum b,d,e (age 10d) mgr: a(active, since 3w) mds: 1/1 daemons up, 1 hot standby osd: 2 osds: 2 up (since 8d), 2 in (since 8d)

data: volumes: 1/1 healthy pools: 3 pools, 65 pgs objects: 119.49k objects, 1.6 GiB usage: 4.7 GiB used, 495 GiB / 500 GiB avail pgs: 119492/358476 objects degraded (33.333%) 64 active+undersized+degraded 1 active+undersized

io: client: 853 B/s rd, 2 op/s rd, 0 op/s wr ` For more details, see the Rook kubectl Plugin

Environment:

vadlakiran commented 11 months ago

attached the operator.yaml file as txt format please check and let me know is it refering any local storage

operator.log

travisn commented 11 months ago

The drain error can safely be ignored for these pods that are stateless. The error is complaining about the hostPath that the pods are using, but it is not a concern for the drain, as they are just logs. rook-ceph/csi-cephfsplugin-provisioner-5f459cfb94-g6ccv, rook-ceph/csi-rbdplugin-provisioner-84c7fb8d76-d4d4b, rook-ceph/rook-ceph-mds-myfs-b-6cc4cfff96-lwlpw, rook-ceph/rook-ceph-mgr-a-7d58c74554-r5pz7, rook-ceph/rook-ceph-tools-79957dbdb7-rtfww

vadlakiran commented 11 months ago

@travisn due to above hostPath that the pods are using we are not able to drain the node. if we are draining the node, all rook-ceph pods will schedule another available node ? if i want to remove the worker node from k8s cluster, worker node is part of the rook-ceph disk i.e osd, how we can stop or move the pods which are running on worker node(cordon and drain worker node) to another available worker node ?

sp98 commented 11 months ago

@travisn due to above hostPath that the pods are using we are not able to drain the node. if we are draining the node, > all rook-ceph pods will schedule another available node ?

yes

if i want to remove the worker node from k8s cluster, worker node is part of the rook-ceph disk i.e osd, how we can stop or move the pods which are running on worker node(cordon and drain worker node) to another available worker node ?

Rook should take care of moving pods to another available node once you drain a node. With respect to storage, you should be good as long as your drain one node at a time. And only drain the other node once the OSD is back and data has re balanced (pgs are active+clean)

vadlakiran commented 11 months ago

@sp98 Thank you I have tried to drain with below option kubectl drain worker2 --ignore-daemonsets --delete-emptydir-data

then we are getting below error message, and node is not drain , rook-ceph-mon-e-f96ff7588-nnwm9 pod is running on worker2, why its not scheduling other available worker node ?

` error when evicting pods/"rook-ceph-mon-e-f96ff7588-nnwm9" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

evicting pod rook-ceph/rook-ceph-mon-e-f96ff7588-nnwm9

error when evicting pods/"rook-ceph-mon-e-f96ff7588-nnwm9" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. `

sp98 commented 11 months ago

error when evicting pods/"rook-ceph-mon-e-f96ff7588-nnwm9" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

We have Pod disruption budget applied on mons. This allows only one mon pod to be drained at a time.

how many mon pods are running when you run kubectl drain worker2 --ignore-daemonsets --delete-emptydir-data ? (can you share the output of kubectl get pods -n rook-ceph ?

And also, the output of kubectl get pdb -n rook-ceph

vadlakiran commented 11 months ago
~$ sudo kubectl get pods -n rook-ceph -owide
NAME                                                    READY   STATUS      RESTARTS   AGE      NODE          NOMINATED NODE   READINESS GATES
csi-cephfsplugin-9926x                                  3/3     Running     18         118d     worker3   <none>           <none>
csi-cephfsplugin-bpz2x                                  3/3     Running     24         118d     worker2   <none>           <none>
csi-cephfsplugin-pr9mq                                  3/3     Running     30         118d     worker1   <none>           <none>
csi-cephfsplugin-provisioner-5f459cfb94-dqxw8           6/6     Running     0          55m      worker1   <none>           <none>
csi-cephfsplugin-provisioner-5f459cfb94-zf7pj           6/6     Running     24         90d      worker3   <none>           <none>
csi-rbdplugin-f58nm                                     3/3     Running     18         118d     worker3   <none>           <none>
csi-rbdplugin-nq2wm                                     3/3     Running     24         118d     worker2   <none>           <none>
csi-rbdplugin-nr7fb                                     3/3     Running     30         118d     worker1   <none>           <none>
csi-rbdplugin-provisioner-84c7fb8d76-77gbk              6/6     Running     24         90d      worker3   <none>           <none>
csi-rbdplugin-provisioner-84c7fb8d76-dlprv              6/6     Running     0          55m      worker1   <none>           <none>
rook-ceph-crashcollector-jmapworker1-8fc9c8496-d5876    1/1     Running     0          11d      worker1   <none>           <none>
rook-ceph-crashcollector-jmapworker2-798bdbd764-xcsz9   0/1     Pending     0          55m      <none>        <none>           <none>
rook-ceph-crashcollector-jmapworker3-f5b4b4f45-hhnwc    1/1     Running     4          90d      worker3   <none>           <none>
rook-ceph-mds-myfs-a-7598fc97b4-r9znp                   2/2     Running     34         90d      worker3   <none>           <none>
rook-ceph-mds-myfs-b-6cc4cfff96-dk9zp                   2/2     Running     0          55m      worker1   <none>           <none>
rook-ceph-mgr-a-7d58c74554-lhwvs                        2/2     Running     0          55m      worker3   <none>           <none>
rook-ceph-mon-b-755464db8f-jglvs                        2/2     Running     8          93d      worker3   <none>           <none>
rook-ceph-mon-d-c78846766-ls6qb                         0/2     Pending     0          55m      <none>        <none>           <none>
rook-ceph-mon-e-f96ff7588-nnwm9                         2/2     Running     0          11d      worker2   <none>           <none>
rook-ceph-osd-1-65fc776d98-5qp5f                        2/2     Running     0          11d      worker1   <none>           <none>
rook-ceph-osd-2-5b6c9f5947-hh965                        2/2     Running     8          94d      worker3   <none>           <none>
rook-ceph-osd-prepare-jmapworker1-zthl9                 0/1     Completed   0          9d       worker1   <none>           <none>
rook-ceph-osd-prepare-jmapworker3-qgwl2                 0/1     Completed   0          9d       worker3   <none>           <none>
rook-ceph-tools-79957dbdb7-tq56f                        1/1     Running     0          55m      worker1   <none>           <none>

===================

NAME                 MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
rook-ceph-mds-myfs   1               N/A               1                     118d
rook-ceph-mon-pdb    N/A             1                 0                     118d
rook-ceph-osd        N/A             1                 1                     11d
sp98 commented 11 months ago

@vadlakiran

Based the output you have shared, it seems that one of your mon pod ook-ceph-mon-d-c78846766-ls6qb is already in pending state.

Thats the reason you are not able to drain the node with mon-e and you are getting the message error when evicting pods/"rook-ceph-mon-e-f96ff7588-nnwm9" -n "rook-ceph" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

vadlakiran commented 11 months ago

@sp98 if i delete the pending pod, this will resolve the issue ?

sp98 commented 11 months ago

It won't. You need 3 mons running for quorum. Currently you have only 2. So I would strongly advice to fix the mon quorum before draining any other node.

So need to figure out why this mon pod was not assigned to any node. You can check kubectl describe pod-name -n rook-ceph to see why this pod was not assigned to any node.

sp98 commented 11 months ago

@vadlakiran any luck finding out why rook-ceph-mon-d-c78846766-ls6qb pod is pending?

vadlakiran commented 11 months ago

@sp98 yes i have deleted deployment which is pending status pod, then i am able to drain the node

Thank you

sp98 commented 11 months ago

@vadlakiran can we close this issue?

sp98 commented 10 months ago

@vadlakiran looks like your issue is fixed. So closing this for now. Please reopen if you think issue is not resolved yet.