rook / rook

Storage Orchestration for Kubernetes
https://rook.io
Apache License 2.0
12.34k stars 2.69k forks source link

Fencing is not working as expected #14832

Open Madhu-1 opened 1 week ago

Madhu-1 commented 1 week ago

Is this a bug report or feature request?

The RBD Fencing is not working as expected for the cases like one below

If the Rook operator is down and the admin adds the taint on the node, the kubernetes will remove the VA objects and it will try to bring up the application pods on another node Once the RBD image is mapped on the new node the RBD watchers gets updated with the new Node IP, As soon as Rook pod is up, it gets the events for the node and as the taint is added on the Node it will go ahead and gets the details of the volumes from the node objects and it fetches the watchers from the rbd status command and it creates the fencing CR for another node which causes the problem

This is a very rare case but it can happen if the watcher is absent on the rbd image, the application pods move to another node and Rook tries to find the watchers it won't create the network fence CR but as soon as the next event is received on the Node it will get ahead and list the watchers and fences the Wrong Ip address.

There could be many more cases like this.

Even though the problem exists only for RBD it would be good to disable it for both cephfs/RBD until we are sure it won't cause any problem to the existing workloads/applications.

Deviation from expected behavior:

Expected behavior:

Rook should blocklist the Right Ip address of the client/Node which is visible on the ceph cluster

How to reproduce it (minimal and precise):

File(s) to submit:

Logs to submit:

Cluster Status to submit:

Environment:

travisn commented 1 week ago

While the feature is now disabled, reopening to continue working on the fix...

gman0 commented 1 day ago

Hello, I've opened https://github.com/ceph/ceph-csi/issues/4913. The idea would be that ceph-csi would store client addresses as annotation in k8s, and Rook would then fetch it when fencing a node. Let me know if it makes sense, thanks!