Closed zugao closed 2 years ago
Quite a lot of comments :) please don't forget the comments that GitHub usually hides/collapses ("load x more comments")
Just FYI: running the operator outside of the cluster is currently not working due to the webhook endpoints not being reachable.
I also tested the case where the resizing of the PVC doesn't work (local kind), you mentioned that it doesn't work. I expected the operator to throw some error in that case. But it looks like it locks completely up if the PVC can't be changed. Afterwards it's not possible to change the instance at all. Not even after restarting it. The instance has to be deleted and re-provisioned in that case. It's probably waiting for some change that will never happen.
Just FYI: running the operator outside of the cluster is currently not working due to the webhook endpoints not being reachable.
I also tested the case where the resizing of the PVC doesn't work (local kind), you mentioned that it doesn't work. I expected the operator to throw some error in that case. But it looks like it locks completely up if the PVC can't be changed. Afterwards it's not possible to change the instance at all. Not even after restarting it. The instance has to be deleted and re-provisioned in that case. It's probably waiting for some change that will never happen.
yeah, it's not ready to be tested yet since we're in the middle of some major restructuring. Currently only the happy path is working, most of the error handling regarding PVC resizing isn't implemented (see also #93 ). We're well aware that the instance can get stuck if you're trying to update storage on a non-resizable storage class. We're not going to spend the time to get storage resizing working on local kind, that has be tested on another cluster that supports dynamic PVC resizing, nor are we currently checking if the storage class supports resizing.
I've converted to draft again, there's much code change coming that isn't ready for testing just yet.
@Kidswiss we're ready for another testing round.
I suggest to actually check out #96 , which contains the necessary changes for chart, and not this branch.
Please be aware that changing the storage capacity is not supported when running on local kind
. You could however change the memory limit, that triggers a pod restart as well. The PVC resize is tested on the appuio beta cluster.
I just added the last tests for this PR. From my side it's ready to be reviewed.
It looks like the operator gets stuck after one change to the instance.
If I increase the memory from 256 to 512 in a first step, it works. If I then increase it from 512 to 1024 it gets stuck in a way that not even restarting the operator applies the change.
Here are the logs when it happens:
2022-06-30T12:29:42.414Z | DEBUG | provider-postgresql.operator | standalone/controller.go:53 | Reconciling | {"controller": "postgresqlstandalone.postgresql.appcat.vshn.io", "controllerGroup": "postgresql.appcat.vshn.io", "controllerKind": "PostgresqlStandalone", "postgresqlStandalone": {"name":"my-instance","namespace":"default"}, "namespace": "default", "name": "my-instance", "reconcileID": "9f3f3245-786e-4575-a643-dbb013863fde"}
2022-06-30T12:29:52.458Z | DEBUG | provider-postgresql.operator | standalone/controller.go:53 | Reconciling | {"controller": "postgresqlstandalone.postgresql.appcat.vshn.io", "controllerGroup": "postgresql.appcat.vshn.io", "controllerKind": "PostgresqlStandalone", "postgresqlStandalone": {"name":"my-instance","namespace":"default"}, "namespace": "default", "name": "my-instance", "reconcileID": "3f87009b-17c8-4b21-af69-380fec908216"}
2022-06-30T12:29:52.458Z | INFO | provider-postgresql.operator | standalone/controller.go:106 | Waiting until instance becomes ready | {"controller": "postgresqlstandalone.postgresql.appcat.vshn.io", "controllerGroup": "postgresql.appcat.vshn.io", "controllerKind": "PostgresqlStandalone", "postgresqlStandalone": {"name":"my-instance","namespace":"default"}, "namespace": "default", "name": "my-instance", "reconcileID": "3f87009b-17c8-4b21-af69-380fec908216"}
2022-06-30T12:30:08.171Z | DEBUG | provider-postgresql.operator.controller-runtime.webhook.webhooks | admission/http.go:96 | received request | {"webhook": "/mutate-postgresql-appcat-vshn-io-v1alpha1-postgresqlstandalone", "UID": "27f8e925-b769-4c77-8674-d45697c7f592", "kind": "postgresql.appcat.vshn.io/v1alpha1, Kind=PostgresqlStandalone", "resource": {"group":"postgresql.appcat.vshn.io","version":"v1alpha1","resource":"postgresqlstandalones"}}
2022-06-30T12:30:08.172Z | DEBUG | provider-postgresql.operator.controller-runtime.webhook.webhooks | admission/http.go:143 | wrote response | {"webhook": "/mutate-postgresql-appcat-vshn-io-v1alpha1-postgresqlstandalone", "code": 200, "reason": "", "UID": "27f8e925-b769-4c77-8674-d45697c7f592", "allowed": true}
2022-06-30T12:30:08.174Z | DEBUG | provider-postgresql.operator.controller-runtime.webhook.webhooks | admission/http.go:96 | received request | {"webhook": "/validate-postgresql-appcat-vshn-io-v1alpha1-postgresqlstandalone", "UID": "ff9b53e2-4186-45cf-a050-b0e2000199d4", "kind": "postgresql.appcat.vshn.io/v1alpha1, Kind=PostgresqlStandalone", "resource": {"group":"postgresql.appcat.vshn.io","version":"v1alpha1","resource":"postgresqlstandalones"}}
2022-06-30T12:30:08.175Z | DEBUG | provider-postgresql.operator.controller-runtime.webhook.webhooks | admission/http.go:143 | wrote response | {"webhook": "/validate-postgresql-appcat-vshn-io-v1alpha1-postgresqlstandalone", "code": 200, "reason": "", "UID": "ff9b53e2-4186-45cf-a050-b0e2000199d4", "allowed": true}
You can see the reconciles of the change, then that it waits to be ready again. Then you see the logs from the webhooks and then nothing happens anymore.
Here's the description of the my-instance
CR at the time it happens:
Name: my-instance
Namespace: default
Labels: <none>
Annotations: <none>
API Version: postgresql.appcat.vshn.io/v1alpha1
Kind: PostgresqlStandalone
Metadata:
Creation Timestamp: 2022-06-30T12:27:19Z
Finalizers:
postgresqlstandalone-postgresql-appcat-vshn-io
Generation: 2
Managed Fields:
API Version: postgresql.appcat.vshn.io/v1alpha1
Manager: kubectl-client-side-apply
Operation: Update
Time: 2022-06-30T12:27:19Z
API Version: postgresql.appcat.vshn.io/v1alpha1
Manager: provider-postgresql
Operation: Update
Time: 2022-06-30T12:27:19Z
API Version: postgresql.appcat.vshn.io/v1alpha1
Manager: node-fetch
Operation: Update
Time: 2022-06-30T12:29:42Z
Resource Version: 5271
UID: 472ecf3d-c284-4a4c-abf9-b78fc0a61e26
Spec:
Backup:
Enabled: true
For Instance:
Enable Super User: true
Major Version: v14
Resources:
Memory Limit: 512Mi
Storage Capacity: 1Gi
Write Connection Secret To Ref:
Name: my-instance
Status:
Conditions:
Last Transition Time: 2022-06-30T12:29:52Z
Message:
Observed Generation: 2
Reason: Available
Status: True
Type: Ready
Deployment Strategy: HelmChart
Helm Chart:
Deployment Namespace: sv-postgresql-s-real-warbound-f832
Modified At: 2022-06-30T12:29:43Z
Name: postgresql
Repository: https://charts.bitnami.com/bitnami
Version: 11.1.23
Observed Generation: 2
Events: <none>
I have tried to change the memory multiple times without any issues. Make sure to use the other branch chart-update
for installing and testing.
Summary
Testing:
For PVC change (only increase is supported). Use the existing instance in beta cluster.
kubectl -n postgresql-system scale deployment provider-postgresql --replicas 0
operator --operator-namespace=postgresql-system
postgresqlstandalone
For other changes use local kind cluster
make chart-prepare local-install samples-install s3-credentials
then edit thepostgresqlstandalone
Update 06/28:
Update: 06/29:
Update: 06/30:
Checklist
For Code changes
bug
,enhancement
,documentation
,change
,breaking
,dependency
as they show up in the changelogarea:operator
charts/
directory.