(Describe the feature, bug, question, proposal that you are requesting)
We found that zookeeper pod keeps crashing after a scale-down workload and a scale-up workload.
We found the crash is because the new pods are reusing the undeleted PVC during the scale-down process. However, this problem still persists even when we specify the reclaimPolicy: Delete.
The root cause of this issue is because there is no guard for scale-up before finishing deleting the PVC. We found that zookeeper-operator checks if the upgrade fails before updating the statefulSet, but this check misses checking for the undeleted OrphanPVCs.
The orphaned PVCs will never get deleted because the operator always waits for the number of ready replicas to equal to desired replicas, which will never happen since one pod keeps crashing.
Importance
(Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have))
Importance: should-have
Description
(Describe the feature, bug, question, proposal that you are requesting) We found that zookeeper pod keeps crashing after a scale-down workload and a scale-up workload. We found the crash is because the new pods are reusing the undeleted PVC during the scale-down process. However, this problem still persists even when we specify the
reclaimPolicy: Delete
.The root cause of this issue is because there is no guard for scale-up before finishing deleting the PVC. We found that zookeeper-operator checks if the upgrade fails before updating the statefulSet, but this check misses checking for the undeleted OrphanPVCs.
The orphaned PVCs will never get deleted because the operator always waits for the number of ready replicas to equal to desired replicas, which will never happen since one pod keeps crashing.
Importance
(Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have)) Importance: should-have
Location
(Where is the piece of code, package, or document affected by this issue?) PVC's AccessModes are always overrided by the predefined
ReadWriteOnce
https://github.com/pravega/zookeeper-operator/blob/c29d475794226ce38de021d72430ccde75a38d2a/api/v1beta1/zookeepercluster_types.go#L758Suggestions for an improvement
PersistentVolumeClaimSpec.AccessModes
as an exposed field should allow users to customize the AccessModes of the PVC.If for any reason zookeeper can only use the
ReadWriteOnce
mode, this field should not be exposed to users(How do you suggest to fix or proceed with this issue?)