Closed nsathyaseelan closed 2 years ago
Cause of issue:
When the node gets rebooted and the NDM pod comes up, the ndm pod fetches information before udev database is updated. This causes NDM failing to detect that cstor is installed on the parent disk (say /dev/sdb
) and goes onto create blockdevice resource for the cstor partition. But during scanning the details of partition device(/dev/sdb1
), it finds the zfs signature on the device and updates the blockdevice resource of parent disks with the devlinks of the partition.
Now CSPI takes this devlink from the device resource and try to use it (/dev/sdb1
) in pool which will fail, since the complete disk (/dev/sdb
) is already part of a pool.
Solution: Restart the NDM pod. When the pod is restarted, it will identify that cstor is installed on the complete disk and proceed with updating the devlink to the correct value, and CSPI will come online,
Issues go stale after 90d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.
Closing this issue, as this is a timing issue after the node gets rebooted. A restart of the NDM pod fixes the issue.
CSPI pools went to offline state upon the reboot of the worker nodes. which restarted the operator NDM pods several times cause
/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:1:0-part1
and tried to use it in a pool, and since it was already part of sdb, caused the issue.Possible Solution:
/dev/disk/by-path/pci-0000:00:10.0-scsi-0:0:1:0
OpenEBS version: 2.8.0
kubernetes version