vshn / appcat-service-postgresql

AppCat Service Provider for PostgreSQL
https://vshn.github.io/appcat-service-postgresql/
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Implement updating instances #88

Closed zugao closed 2 years ago

zugao commented 2 years ago

Summary

Testing:

Update 06/28:

Update: 06/29:

Update: 06/30:

Checklist

For Code changes

ccremer commented 2 years ago

Quite a lot of comments :) please don't forget the comments that GitHub usually hides/collapses ("load x more comments")

Kidswiss commented 2 years ago

Just FYI: running the operator outside of the cluster is currently not working due to the webhook endpoints not being reachable.

I also tested the case where the resizing of the PVC doesn't work (local kind), you mentioned that it doesn't work. I expected the operator to throw some error in that case. But it looks like it locks completely up if the PVC can't be changed. Afterwards it's not possible to change the instance at all. Not even after restarting it. The instance has to be deleted and re-provisioned in that case. It's probably waiting for some change that will never happen.

ccremer commented 2 years ago

Just FYI: running the operator outside of the cluster is currently not working due to the webhook endpoints not being reachable.

I also tested the case where the resizing of the PVC doesn't work (local kind), you mentioned that it doesn't work. I expected the operator to throw some error in that case. But it looks like it locks completely up if the PVC can't be changed. Afterwards it's not possible to change the instance at all. Not even after restarting it. The instance has to be deleted and re-provisioned in that case. It's probably waiting for some change that will never happen.

yeah, it's not ready to be tested yet since we're in the middle of some major restructuring. Currently only the happy path is working, most of the error handling regarding PVC resizing isn't implemented (see also #93 ). We're well aware that the instance can get stuck if you're trying to update storage on a non-resizable storage class. We're not going to spend the time to get storage resizing working on local kind, that has be tested on another cluster that supports dynamic PVC resizing, nor are we currently checking if the storage class supports resizing.

I've converted to draft again, there's much code change coming that isn't ready for testing just yet.

ccremer commented 2 years ago

@Kidswiss we're ready for another testing round. I suggest to actually check out #96 , which contains the necessary changes for chart, and not this branch. Please be aware that changing the storage capacity is not supported when running on local kind. You could however change the memory limit, that triggers a pod restart as well. The PVC resize is tested on the appuio beta cluster.

zugao commented 2 years ago

I just added the last tests for this PR. From my side it's ready to be reviewed.

Kidswiss commented 2 years ago

It looks like the operator gets stuck after one change to the instance.

If I increase the memory from 256 to 512 in a first step, it works. If I then increase it from 512 to 1024 it gets stuck in a way that not even restarting the operator applies the change.

Here are the logs when it happens:

2022-06-30T12:29:42.414Z | DEBUG | provider-postgresql.operator | standalone/controller.go:53 | Reconciling | {"controller": "postgresqlstandalone.postgresql.appcat.vshn.io", "controllerGroup": "postgresql.appcat.vshn.io", "controllerKind": "PostgresqlStandalone", "postgresqlStandalone": {"name":"my-instance","namespace":"default"}, "namespace": "default", "name": "my-instance", "reconcileID": "9f3f3245-786e-4575-a643-dbb013863fde"}
2022-06-30T12:29:52.458Z | DEBUG | provider-postgresql.operator | standalone/controller.go:53 | Reconciling | {"controller": "postgresqlstandalone.postgresql.appcat.vshn.io", "controllerGroup": "postgresql.appcat.vshn.io", "controllerKind": "PostgresqlStandalone", "postgresqlStandalone": {"name":"my-instance","namespace":"default"}, "namespace": "default", "name": "my-instance", "reconcileID": "3f87009b-17c8-4b21-af69-380fec908216"}
2022-06-30T12:29:52.458Z | INFO | provider-postgresql.operator | standalone/controller.go:106 | Waiting until instance becomes ready | {"controller": "postgresqlstandalone.postgresql.appcat.vshn.io", "controllerGroup": "postgresql.appcat.vshn.io", "controllerKind": "PostgresqlStandalone", "postgresqlStandalone": {"name":"my-instance","namespace":"default"}, "namespace": "default", "name": "my-instance", "reconcileID": "3f87009b-17c8-4b21-af69-380fec908216"}
2022-06-30T12:30:08.171Z | DEBUG | provider-postgresql.operator.controller-runtime.webhook.webhooks | admission/http.go:96 | received request | {"webhook": "/mutate-postgresql-appcat-vshn-io-v1alpha1-postgresqlstandalone", "UID": "27f8e925-b769-4c77-8674-d45697c7f592", "kind": "postgresql.appcat.vshn.io/v1alpha1, Kind=PostgresqlStandalone", "resource": {"group":"postgresql.appcat.vshn.io","version":"v1alpha1","resource":"postgresqlstandalones"}}
2022-06-30T12:30:08.172Z | DEBUG | provider-postgresql.operator.controller-runtime.webhook.webhooks | admission/http.go:143 | wrote response | {"webhook": "/mutate-postgresql-appcat-vshn-io-v1alpha1-postgresqlstandalone", "code": 200, "reason": "", "UID": "27f8e925-b769-4c77-8674-d45697c7f592", "allowed": true}
2022-06-30T12:30:08.174Z | DEBUG | provider-postgresql.operator.controller-runtime.webhook.webhooks | admission/http.go:96 | received request | {"webhook": "/validate-postgresql-appcat-vshn-io-v1alpha1-postgresqlstandalone", "UID": "ff9b53e2-4186-45cf-a050-b0e2000199d4", "kind": "postgresql.appcat.vshn.io/v1alpha1, Kind=PostgresqlStandalone", "resource": {"group":"postgresql.appcat.vshn.io","version":"v1alpha1","resource":"postgresqlstandalones"}}
2022-06-30T12:30:08.175Z | DEBUG | provider-postgresql.operator.controller-runtime.webhook.webhooks | admission/http.go:143 | wrote response | {"webhook": "/validate-postgresql-appcat-vshn-io-v1alpha1-postgresqlstandalone", "code": 200, "reason": "", "UID": "ff9b53e2-4186-45cf-a050-b0e2000199d4", "allowed": true}

You can see the reconciles of the change, then that it waits to be ready again. Then you see the logs from the webhooks and then nothing happens anymore.

Here's the description of the my-instance CR at the time it happens:

Name:         my-instance
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  postgresql.appcat.vshn.io/v1alpha1
Kind:         PostgresqlStandalone
Metadata:
  Creation Timestamp:  2022-06-30T12:27:19Z
  Finalizers:
    postgresqlstandalone-postgresql-appcat-vshn-io
  Generation:  2
  Managed Fields:
    API Version:  postgresql.appcat.vshn.io/v1alpha1
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2022-06-30T12:27:19Z
    API Version:  postgresql.appcat.vshn.io/v1alpha1
    Manager:      provider-postgresql
    Operation:    Update
    Time:         2022-06-30T12:27:19Z
    API Version:  postgresql.appcat.vshn.io/v1alpha1
    Manager:         node-fetch
    Operation:       Update
    Time:            2022-06-30T12:29:42Z
  Resource Version:  5271
  UID:               472ecf3d-c284-4a4c-abf9-b78fc0a61e26
Spec:
  Backup:
    Enabled:  true
  For Instance:
    Enable Super User:  true
    Major Version:      v14
    Resources:
      Memory Limit:      512Mi
      Storage Capacity:  1Gi
  Write Connection Secret To Ref:
    Name:  my-instance
Status:
  Conditions:
    Last Transition Time:  2022-06-30T12:29:52Z
    Message:               
    Observed Generation:   2
    Reason:                Available
    Status:                True
    Type:                  Ready
  Deployment Strategy:     HelmChart
  Helm Chart:
    Deployment Namespace:  sv-postgresql-s-real-warbound-f832
    Modified At:           2022-06-30T12:29:43Z
    Name:                  postgresql
    Repository:            https://charts.bitnami.com/bitnami
    Version:               11.1.23
  Observed Generation:     2
Events:                    <none>
zugao commented 2 years ago

I have tried to change the memory multiple times without any issues. Make sure to use the other branch chart-update for installing and testing.