Open Xunop opened 1 month ago
This is a tricky issue for a few reasons:
yes-really-destroy-data
flag. In case the OSD was accidentally deleted, it's still possible to add back to the cluster as long as the disk wasn't wiped, so we error on the side of not wiping.So the recommended approach is currently to change the device filter or other settings, and purge the disk manually, so Rook doesn't try to add the OSD back. Not ideal, but it's a struggle between automation vs manual, and not ever wanting to lose data accidentally.
Is this a bug report or feature request?
Deviation from expected behavior:
After following the steps in the Rook documentation for removing an OSD, Rook still starts a deployment for the removed OSD. I only removed the OSD but did not remove the disk from the server.
Expected behavior:
When I remove an OSD from the server but leave the disk in place, Rook should recognize that I no longer need this OSD and should not attempt to start its deployment.
How to reproduce it (minimal and precise):
More infomation:
I spent some time going through the source code and then realized that this behavior is related to this code: https://github.com/rook/rook/blob/master/pkg/daemon/ceph/osd/volume.go#L107. Since I only removed an OSD and did not add a new one, the
getAvailableDevices
function skips the devices, resulting in an empty device array being passed to theconfigureCVDevices
function. This function retrieves OSD information using the Ceph commandceph-volume lvm|raw list
, which still returns information for all OSDs because Ceph identifies OSDs by checking the block signatures. As a result, it retrieves information for OSDs that have already been deleted. Perhaps we should exclude the OSDs that have been deleted from the OSDs that Ceph retrieves, or consider wiping the Ceph signature when an OSD is removed?OSD prepare pod log:
After delete OSD
/dev/loop1
: