Closed PiotrKlimczak closed 3 years ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.
Deviation from expected behavior: After csi-rbdplugin restart, pods which are using storage on that node where csi-rbdplugin was restarted are hanging endlessly. Force deleting pods, doesn't solve the problem as then compute machine (VM) fails to shutdown hanging. This is 100% reproducible.
Expected behavior: If csi-rbdplugin restart happens, it should not affect pods nor VM in any way.
How to reproduce it (minimal and precise): Restart csi-rbdplugin pod on compute node.
File(s) to submit:
Also nothing unusual in dmesg
Operator config map
Operator deployment:
I have checked all the logs I could, but have not found anything useful or different than on pods where there was no restart. CSI RBD plugin seems to be starting again correctly and is not complaining in logs in any way.
Environment:
uname -a
): 5.10.19-200.fc33.x86_64rook version
inside of a Rook Pod): 1.6.2, was happening on 1.5.11 tooceph -v
): 16.2.3, was happening on previous versions too.kubectl version
): v1.20.0-1058+7d0a2b269a2741-dirtyceph health
in the Rook Ceph toolbox): HEALTH_WARN mons are allowing insecure global_id reclaimWe have updated everything in hope this might be fixed in newer version but problem still persist. Honestly I have no idea where to look, there I cannot find anything useful/different in logs when compared "unhealthy" VM with healthy one.