Open dmantas opened 8 months ago
[Triage] Thanks @dmantas the steps implemented as part of the design https://github.com/opensearch-project/opensearch-k8s-operator/issues/112#issuecomment-1107854259 should have the size reflected inside the POD as well, can you see if there are any errors in the operator pod? Thanks Adding @dbason @swoehrl-mw @jochenkressin @pchmielnik @salyh @bbarani
Hi @prudhvigodithi , I followed these steps and I was successful, thank you. Although after deleting the sts, this was immediately being recreated by the Operator. I had to temporarily delete the Operator to prevent this. Is there any way I can temporarily prevent the Operator from messing with the deployment without deleting it completely? E.g. in a similar case for the Prometheus Operator, it has a pause
flag to achieve this.
Other than this method, I understand it's not possible to resize volumes automatically, i.e. by modifying the OpensearchCluster
manifest, correct? Because I did a test today and I noticed these errors in the Operator logs:
{"level":"dpanic","ts":"2024-04-02T11:51:56.608Z","msg":"odd number of arguments passed as key-value pairs for logging","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"7cea47f7-e5e4-430c-9d30-bc066f133eeb","ignored key":"9Gi","stacktrace":"github.com/Opster/opensearch-k8s-operator/opensearch-operator/pkg/reconcilers.(*ClusterReconciler).maybeUpdateVolumes\n\t/workspace/pkg/reconcilers/cluster.go:490\ngithub.com/Opster/opensearch-k8s-operator/opensearch-operator/pkg/reconcilers.(*ClusterReconciler).reconcileNodeStatefulSet\n\t/workspace/pkg/reconcilers/cluster.go:301\ngithub.com/Opster/opensearch-k8s-operator/opensearch-operator/pkg/reconcilers.(*ClusterReconciler).Reconcile\n\t/workspace/pkg/reconcilers/cluster.go:116\ngithub.com/Opster/opensearch-k8s-operator/opensearch-operator/controllers.(*OpenSearchClusterReconciler).reconcilePhaseRunning\n\t/workspace/controllers/opensearchController.go:319\ngithub.com/Opster/opensearch-k8s-operator/opensearch-operator/controllers.(*OpenSearchClusterReconciler).Reconcile\n\t/workspace/controllers/opensearchController.go:141\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"}
{"level":"info","ts":"2024-04-02T11:51:56.608Z","msg":"Disk sizes differ for nodePool %s, Current: %s, Desired: %s","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"7cea47f7-e5e4-430c-9d30-bc066f133eeb","data-nodes":"8Gi"}
{"level":"info","ts":"2024-04-02T11:51:56.608Z","msg":"Deleting statefulset while orphaning pods caas-opensearch-data-nodes","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"7cea47f7-e5e4-430c-9d30-bc066f133eeb"}
{"level":"info","ts":"2024-04-02T11:51:56.928Z","msg":"object is being deleted, backing off","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"7cea47f7-e5e4-430c-9d30-bc066f133eeb","name":"caas-opensearch-data-nodes","namespace":"opensearch-operator-cluster","apiVersion":"apps/v1","kind":"StatefulSet"}
{"level":"error","ts":"2024-04-02T11:51:57.050Z","msg":"Reconciler error","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"7cea47f7-e5e4-430c-9d30-bc066f133eeb","error":"StatefulSet.apps \"caas-opensearch-data-nodes\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226"}
{"level":"info","ts":"2024-04-02T11:51:57.051Z","msg":"Reconciling OpenSearchCluster","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"80ef6069-6650-4532-8636-f3756e9a9f02","cluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"}}
{"level":"info","ts":"2024-04-02T11:51:57.089Z","msg":"Generating certificates","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"80ef6069-6650-4532-8636-f3756e9a9f02","interface":"transport"}
{"level":"info","ts":"2024-04-02T11:51:57.089Z","msg":"Generating certificates","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"80ef6069-6650-4532-8636-f3756e9a9f02","interface":"http"}
{"level":"info","ts":"2024-04-02T11:51:57.210Z","msg":"resource created","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"80ef6069-6650-4532-8636-f3756e9a9f02","name":"caas-opensearch-data-nodes","namespace":"opensearch-operator-cluster","apiVersion":"apps/v1","kind":"StatefulSet"}
{"level":"info","ts":"2024-04-02T11:52:00.592Z","msg":"Starting rolling restart of the StatefulSet caas-opensearch-data-nodes","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"80ef6069-6650-4532-8636-f3756e9a9f02","reconciler":"restart"}
{"level":"info","ts":"2024-04-02T11:52:00.596Z","msg":"Preparing to restart pod caas-opensearch-data-nodes-0","controller":"opensearchcluster","controllerGroup":"opensearch.opster.io","controllerKind":"OpenSearchCluster","OpenSearchCluster":{"name":"caas-opensearch","namespace":"opensearch-operator-cluster"},"namespace":"opensearch-operator-cluster","name":"caas-opensearch","reconcileID":"80ef6069-6650-4532-8636-f3756e9a9f02","reconciler":"restart"}
So it seems the Operator "knows" how to do the resize, but it doesn't work. It even seems only the first pod in the StatefulSet is restarted, but even for that one the filesystem inside the pod isn't increased - but all PVCs are increased. Is there something not working correctly in this implementation?
Again thanks for your help, at least I seem to have a consistent way to do this.
What is the bug?
When we increase
diskSize
in theOpensearchCluster
spec, the PVCs are resized accordingly. However, filesystem in the pods is not resized. Or, to be precise, it might be resized for some of the pods.We use Openstack with Cinder volume (Storage class:
csi-cinder-sc-delete
).How can one reproduce the bug?
In an already deployed cluster with
diskSize
for a nodePool equal to5Gi
, modifydiskSize
and increase it to6Gi
. Then apply the manifest. Then increase it to7Gi
and apply the manifest again. So thenodePools
config looks likeCheck the PVCs, they are
7Gi
in size:Check the pods filesystem:
Pod-0:
Pod-1:
Pod-2:
So we can see that somehow 2 out of 3 pods saw the first resizing to 6Gi, but not the second resizing to 7Gi. The 3rd pod is still as if no resizing took place.
I tried to delete the pods, but it doesn't help.
What is the expected behavior?
The PVCs should be 7Gi. When checking the filesystem from inside the pod (
df -h
), the filesystem should be 7 gigabytes.What is your host/environment?
Operator version 2.5.1 (also tested with 2.4.0).