Open tman5 opened 1 month ago
Firstly, in the future, please follow the Rook bug template questionnaire that is provided when using the "new issue" button. This helps us better understand and triage issues.
Now, I don't completely understand the scenario, but from context, I believe the issue being described is this: you have removed a node (or multiple nodes), and you no longer want the disks on those removed nodes to be used for Rook/Ceph storage. Is that correct?
If so, this is intended behavior for both Ceph and Rook as a data safety mechanism. Neither Ceph nor Rook can know for sure if a node removal event means that the OSDs are gone forever or not. Many k8s platforms remove "Node" resources as part of normal k8s update management, so node removal does not 1:1 imply OSD removal.
If you have removed the node and the OSDs are not going to be brought back online, the OSD purge workflow can be used to tell Ceph that it no longer needs to track the disks.
https://rook.io/docs/rook/latest-release/Storage-Configuration/Advanced/ceph-osd-mgmt/#remove-an-osd
Does this help answer the problem you are bringing up, or have I misread something?
Correct. The nodes are already destroyed and the OSDs are not showing up in the cluster anymore. We ran that workflow. But the pods are still having issues mounting with the error above
Normal Scheduled 19m default-scheduler Successfully assigned coder/coder-onboarding-workspace-596c77bbc8-l9sn7 to host13
Normal SuccessfulAttachVolume 19m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-e1e9ba45-8b06-4b8b-ad68-08f3787cb8f5"
Warning FailedMount 17m kubelet MountVolume.MountDevice failed for volume "pvc-e1e9ba45-8b06-4b8b-ad68-08f3787cb8f5" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedMount 49s (x15 over 17m) kubelet MountVolume.MountDevice failed for volume "pvc-e1e9ba45-8b06-4b8b-ad68-08f3787cb8f5" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-d80a62d2-29cd-401b-b774-5d4a4a5f6efc already exists
If you attempted to purge all 3 OSDs at the same time, you may likely be experiencing some data loss. Ceph's default is to use 3 replicas, and in a cluster that is configured with one-OSD-per-node like this, any 3 nodes/disks are [statistically] likely to contain some data that is not replicated on any other nodes/disks. I don't see any other failure domains in the OSD hierarchy, suggesting this may be likely, unfortunately.
[addendum] If this is the case, it may be necessary to find some way of adding one of the removed OSDs back into the cluster to allow PGs to be read from it and replicated onto other disks.
Another possibility is that the ongoing data recovery process that Ceph is doing may be saturating the network links and starving clients of their ability to perform IO. When I see daemons [osd.ID, ...] have slow ops
, this is my usual first suspect. It's possible that Ceph may recover from the state it's in eventually, allowing client IO to not be bottlenecked.
After removing underlying k8s nodes with removing the OSD, rook-ceph is still reporting health issues
Pods cannot use PVCs at the moment with these errors: