Closed abyth closed 2 years ago
The support for resizing the storage is indicated in the StorageClass. But AFAIK there is no flag to indicate whether online resizing is required. It is also not mentioned in any of the links you shared. So it is hard to handle this programatically if we cannot detect it.
My personal experience (from AWS - I do not have OpenStack anywhere) is that it actually works fine. When online resizing is supported, the volumes do not reach the FileSystemResizePending
and the restart to resize the file system is never triggered. That was at least my observation and assumption from the pod not restarting -> I never looked into the details since it seemed to work fine.
If you know how the online resizing support can be detected, we can see how it could be added to the operator.
"Kubernetes increases the capacity of the selected persistent volumes in response to a request from the Cluster Operator. When the resizing is complete, the Cluster Operator restarts all pods that use the resized persistent volumes. This happens automatically."
@scholzj , Hello, I want know that: Is it necessary to restarts all pods by Cluster Operator,How to disable the restarts action if I only change spec.kafka.storage.size and the underlying storage driver (e.g. cinder-csi-plugin) supports online filesystem expansion . I test this in alibaba-cloud ssd which supports online filesystem expansion (https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver) ,but I always found this:
2021-09-26 13:19:10 DEBUG Util:129 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Pods resource test-cluster-kafka-1 in namespace kafka-cluster is ready
2021-09-26 13:19:10 DEBUG KafkaRoller:647 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Pod test-cluster-kafka-1 is now ready
2021-09-26 13:19:10 DEBUG KafkaRoller:275 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Considering restart of pod 2 after delay of 0 MILLISECONDS
2021-09-26 13:19:10 DEBUG KafkaAssemblyOperator:3351 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Rolling update of kafka-cluster/test-cluster-kafka: pod test-cluster-kafka-2 has strimzi.io/generation=3; sts has strimzi.io/generation=4
2021-09-26 13:19:10 DEBUG KafkaAssemblyOperator:3462 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Rolling update of kafka-cluster/test-cluster-kafka: pod test-cluster-kafka-2 has strimzi.io/custom-listener-cert-thumbprints=; sts has strimzi.io/custom-listener-cert-thumbprints=
2021-09-26 13:19:10 DEBUG KafkaAssemblyOperator:3513 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Rolling pod test-cluster-kafka-2 due to [Pod has old generation]
2021-09-26 13:19:10 INFO KafkaRoller:507 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Pod 2 needs to be restarted. Reason: [Pod has old generation]
From: https://github.com/strimzi/strimzi-kafka-operator/blob/main/documentation/modules/configuring/proc-resizing-persistent-volumes.adoc ,I see: "Kubernetes increases the capacity of the selected persistent volumes in response to a request from the Cluster Operator. When the resizing is complete, the Cluster Operator restarts all pods that use the resized persistent volumes. This happens automatically."
@scholzj , Hello, I want know that: Is it necessary to restarts all pods by Cluster Operator,How to disable the restarts action if I only change spec.kafka.storage.size and the underlying storage driver (e.g. cinder-csi-plugin) supports online filesystem expansion .
I test this in alibaba-cloud csi which supports online filesystem expansion (https://github.com/kubernetes-sigs/alibaba-cloud-csi-driver) ,but I always found all pods were restarted(I wish the operator not to restart kafka pod).Here is some logs:
2021-09-26 13:19:10 DEBUG Util:129 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Pods resource test-cluster-kafka-1 in namespace kafka-cluster is ready
2021-09-26 13:19:10 DEBUG KafkaRoller:647 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Pod test-cluster-kafka-1 is now ready
2021-09-26 13:19:10 DEBUG KafkaRoller:275 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Considering restart of pod 2 after delay of 0 MILLISECONDS
2021-09-26 13:19:10 DEBUG KafkaAssemblyOperator:3351 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Rolling update of kafka-cluster/test-cluster-kafka: pod test-cluster-kafka-2 has strimzi.io/generation=3; sts has strimzi.io/generation=4
2021-09-26 13:19:10 DEBUG KafkaAssemblyOperator:3462 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Rolling update of kafka-cluster/test-cluster-kafka: pod test-cluster-kafka-2 has strimzi.io/custom-listener-cert-thumbprints=; sts has strimzi.io/custom-listener-cert-thumbprints=
2021-09-26 13:19:10 DEBUG KafkaAssemblyOperator:3513 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Rolling pod test-cluster-kafka-2 due to [Pod has old generation]
2021-09-26 13:19:10 INFO KafkaRoller:507 - Reconciliation #22(watch) Kafka(kafka-cluster/test-cluster): Pod 2 needs to be restarted. Reason: [Pod has old generation]
@huntkalio The way it works in Strimzi is that we wait for the PVC to be in the FileSystemResizePending
state which indicates that the Pod might need to be restarted. I'm not aware of any other way to find out that a restart would be needed. So I do not think we can change the behaviour.
The state is checked during the periodical reconciliations. So if you see it roll the pods, it means that it remains in this state for some time. For the record, I'm not sure if the driver which resizes the filesystem online should not set the state. I never found a proper documentation explaining how online FS resizing should work. But as long as there is no better way to detect that the restart is needed, there is only so much one can do about it.
@scholzj 1.the log show 'Pod has old generation',is this mean the restart is due to storage config change,but not due to FileSystemResizePending? 2.And if cluster operator watch FileSystemResizePending then restart cluster ,can we add a restart disable config to the operator,so zookeeper or kafka cluster will not restart?
2.And if cluster operator watch FileSystemResizePending then restart cluster ,can we add a restart disable config to the operator,so zookeeper or kafka cluster will not restart?
TBH, I'm not sure we want to do this. Lot of effort, hard to test and maintain and IMHO very little gain our of it.
1.the log show 'Pod has old generation',is this mean the restart is due to storage config change,but not due to FileSystemResizePending?
If you think it is coming from somewhere else than the resize process, can you provide the full operator log in the DEBUG mode? I do not think this small snippet you shared above makes it completely clear where and when does it come from.
@scholzj cluster-operator-image: image: quay.io/strimzi/operator:0.25.0 kafka-cluster-ymal: test-aliyun-cluster.yaml.txt
1.first,I test resize kafka storage(spec.kafka.storage.volumes.0.size) from 27Gi to 28Gi,I found all kafka 3 pod restart . Pod 1~3 needs to be restarted. Reason: [Pod has old generation] log is here: resize_kafka.log
2.then,I test resize zookeeperstorage(spec.zookeeper.storage.volumes.0.size) from 20Gi to 21Gi,I found only ' test-cluster-zookeeper-1' one pod restart.
2021-09-27 11:12:26 DEBUG KafkaAssemblyOperator:2504 - Reconciliation #279(watch) Kafka(kafka-cluster/test-cluster): The PVC data-test-cluster-zookeeper-0 is resizing, nothing to do
2021-09-27 11:12:26 INFO KafkaAssemblyOperator:2510 - Reconciliation #279(watch) Kafka(kafka-cluster/test-cluster): The PVC data-test-cluster-zookeeper-1 is waiting for file system resizing and the pod test-cluster-zookeeper-1 needs to be restarted.
2021-09-27 11:12:26 DEBUG KafkaAssemblyOperator:2504 - Reconciliation #279(watch) Kafka(kafka-cluster/test-cluster): The PVC data-test-cluster-zookeeper-2 is resizing, nothing to do
log is here: resize_zookeeper.log
I looked into it ... I needed refresh my memory by reading the code. The short story, this is expected.
Re 1) Right now, Strimzi uses StatefulSets and they make it hard to do things such as adding or removing volumes for the JBOD storage. To be able to hack around it, we need to have a special rolling update and we also carry some special annotations and that is why the pods need to be rolled even when it does not catch the FileSystemResizePending
condition. More details about this are in #4153. We might be able to improve this as/if we manage to remove the StatefulSets which is something we are looking into.
Re 2) This seems to be what I suggested before. Even if your infra does not require pod restarts, if the timing is that the operator sees the volume with FileSystemResizePending
it will roll it. It looks like the one pod (volume) had this condition during the reconciliation, but the other pods did not so they were not rolled. ZooKeeper does not support JBOD storage, so it does not need the additional rolling update.
Ok,I got it,thanks. Another question: if i deploy the kafka cluster with persistent-claim storage type (not jbod),then change kafka storage size, does the cluster is expected to roll update? It seems will also roll update the kafka cluster. (I will test this tomorrow.)
You should always use JBOD storage - even with a single disk - because you never know when you might need to add more disks. But anyway, the behaviour described above is the same regardless.
Ok, I got it.
Re 1) Right now, Strimzi uses StatefulSets and they make it hard to do things such as adding or removing volumes for the JBOD storage. To be able to hack around it, we need to have a special rolling update and we also carry some special annotations and that is why the pods need to be rolled even when it does not catch the FileSystemResizePending condition. More details about this are in https://github.com/strimzi/strimzi-kafka-operator/pull/4153. We might be able to improve this as/if we manage to remove the StatefulSets which is something we are looking into.
This should be now fixed in #6310 - it is fixed only when the StrimziPodSets are used (currently alpha-level feature gate which needs to be enabled). There is no plan to fix it for StatefulSets => any solution there would not be easy because of the complications with adding / removing volumes.
Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...] Strimzi supports resizing PVCs and restarts pods when the PVCs reach
FileSystemResizePending
state. However, if the underlying storage driver (e.g. cinder-csi-plugin) supports online filesystem expansion such a restart is not necessary. The pods are then scheduled for a restart while in most cases the PVCs are already resizedDescribe the solution you'd like Strimzi should detect online filesystem expansion functionality e.g. based on the storage-class being used. If such functionality is detected, strimzi should not restart pods when PVCs reach
FileSystemResizePending
state.Describe alternatives you've considered Strimzi could offer an optional parameter with which users can annotate online filesystem expansion. After setting such a parameter strimzi should not restart pods when PVCs reach
FileSystemResizePending
state.Additional context CSI developer guide about volume expansion Original blog post about online expansion