yugabyte / yugabyte-operator

Kubernetes Operator for YugabyteDB (legacy)
65 stars 29 forks source link

[BUG] Secret object gets mistakenly deleted if operator reads stale cluster.Spec.TLS.Enabled #35

Open srteam2020 opened 3 years ago

srteam2020 commented 3 years ago

Describe the bug

After restarting from a crash, the operator can mistakenly delete the secret objects if it reads stale state of cluster.Spec.TLS.Enabled.

Consider the following situation, there are two apiservers, apiserver1 and apiserver2, and the operator initially is communicating with apiserver1. The field cluster.Spec.TLS.Enabled is initially set to false, and then changed to true by the user. The operator reconciles and creates the Secret object accordingly. After the Secret object is created, the operator crashes, restarts, and starts to communicate with apiserver2. The apiserver2 is stale and still holds the cluster.Spec.TLS.Enabled field as false at the moment. The operator cannot tell whether the data is stale or not so it directly deletes the Secret object.

To Reproduce

Steps to reproduce the behavior:

  1. Create YBCluster with cluster.Spec.TLS.Enabled set to false.
  2. Change cluster.Spec.TLS.Enabled to true. Operator will reconcile and create the Secret objects. Meanwhile, apiserver2 is straggling and still holds cluster.Spec.TLS.Enabled as false.
  3. Operator crashes, restarts, and communicates with apiserver2. It then reconciles and deletes the Secret objects since cluster.Spec.TLS.Enabled is false on apiserver2.

Fix

We are willing to send a PR to fix this problem. A potential fix is to use the Secret object's UID on deletion (precondition). If the Secret object is stale, etcd will tell that the UID is invalid and prevent the deletion.