janus-idp / backstage-operator

Apache License 2.0
4 stars 11 forks source link

Add checkPostPassword PostHook #22

Closed PatAKnight closed 10 months ago

PatAKnight commented 1 year ago

Description

This PR adds a new PostHook called checkPostPassword that will check the Helm Release to see if upstream.postgresql.auth.password and upstream.postgresql.auth.postgresPassword has been set. The purpose of this is to fix the problem of reconcile error stating that upstream.postgresql.auth.password and upstream.postgresql.auth.postgresPassword must not be empty.

The checkPostPassword will check for the name of the postgresql secret based on whether or not upstream.postgresql.fullnameOverride or upstream.postgresql.nameOverride have been set. It will then check if the password and postgresPassword parameters have been set.

Which issues(s) does this PR fix

PatAKnight commented 1 year ago

I wasn't able to make .upstream.postgresql.auth.existingSecret consistently work. I would continue to get the errors PASSWORDS ERROR: You must provide your current passwords when upgrading the release. The only way that I could circumvent this was to use .upstream.postgresql.auth.password and .upstream.postgresql.auth.postgresPassword. When it did work, I ended up in a loop where during an upgrade existingSecret would be set and the secret that was created by the operator was deleted, then on the next cycle existingSecret would be unset and the secret would be created.

As to the exposure problem, I would assume that this does.

I ended up having to choose postHook instead of prehook because the latter will run before any install, upgrade, and uninstall. This means on install the two password fields will not get set until after we upgrade since we have to wait for the secret to be created. The problem is that before an upgrade the helm-operator-plugin performs a dry run resulting in the errors from above being thrown. It happens because the dry run is performed before the prehook.

I did try to arbitrarily set the password and postgresPassword using a prehook so that it is successful during the install. But for some reason it is not honored and I still end up with the errors mentioned above.

As for how the postHook affects future reconcile loops, it seems like it does by starting a reconciliation loop constantly. I was able to track down the cause for issue #10 and it looks like it is SkipDependentWatches. Setting SkipDependentWatches to true seems to fix the issue of reconciling initiated without a valid cause but the postHook still does kick off the reconciliation. Didn't realize that this would be the case until after I figured out the problem for issue #10

tumido commented 1 year ago

I see. That makes sense to me... And I must admit, this is not an easy problem to solve. Should we maybe open an issue against operator-sdk to get some help?

Also, I found out that the release object is stored as a Kubernetes Secret anyway, so when it comes to "exposing the password", it's actually not that bad...

However, I've found another problem. when I was testing this PR:

When I initially created the Backstage CR I had: image

However, after a few seconds, it reconciles to:

image

2023-10-12T21:32:32+02:00   DEBUG   controllers.Helm    Reconciliation triggered    {"backstage": "test2/lala"}
2023-10-12T21:32:32+02:00   DEBUG   controllers.Helm    preparing upgrade for lala
2023-10-12T21:32:34+02:00   ERROR   Reconciler error    {"controller": "backstage-controller", "object": {"name":"lala","namespace":"test2"}, "namespace": "test2", "name": "lala", "reconcileID": "b7687bce-d36f-4374-ba40-bbe02348e307", "error": "execution error at (backstage/charts/upstream/charts/postgresql/templates/secrets.yaml:5:24): \nPASSWORDS ERROR: You must provide your current passwords when upgrading the release.\n                 Note that even after reinstallation, old credentials may be needed as they may be kept in persistent volume claims.\n                 Further information can be obtained at https://docs.bitnami.com/general/how-to/troubleshoot-helm-chart-issues/#credential-errors-while-upgrading-chart-releases\n\n    'global.postgresql.auth.postgresPassword' must not be empty, please add '--set global.postgresql.auth.postgresPassword=$POSTGRES_PASSWORD' to the command. To get the current value:\n\n        export POSTGRES_PASSWORD=$(kubectl get secret --namespace \"test2\" lala-postgresql -o jsonpath=\"{.data.postgres-password}\" | base64 -d)\n"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
    /Users/tcoufal/.go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
    /Users/tcoufal/.go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:274
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
    /Users/tcoufal/.go/pkg/mod/sigs.k8s.io/controller-runtime@v0.14.1/pkg/internal/controller/controller.go:235

Do you get the same?

PatAKnight commented 1 year ago

I have found that if I sometimes stop the operator and restart, I will get the password error that you show above. Uninstalling the helm chart typically fixes the problem for me.

Another potential alternative that I am looking into is if we can still stick with the prehook and create our own secret that we could inject into the chart values. But I don't know if that is the route that we would want to go.

I am fine with opening up an issue to the operator-sdk to see if they might have some suggestions that might have been overlooked.

PatAKnight commented 1 year ago

I was able to use the prehook to make a secret and then add it to the chart values. But I noticed some weird issues during implementation.

My original method created the secret and added it to the upstream.postgresql.auth.password and postgresPassword fields, however I was still getting the same password error. I also noticed something weird when inspecting the chart values.

Screenshot from 2023-10-12 22-52-33

There is now a backstage field that shows up

Screenshot from 2023-10-12 22-52-30

I then tried setting backstage.postgresql.auth.password and postgresPassword instead of using upstream and it worked. I was no longer receiving the errors

tumido commented 1 year ago

Ok... That would indicate the helm (when used as a golang library) doesn't honor aliases... That would be bad, since we would have to redo our default values... (or remove the helm dependency alias from Janus IDP chart) This is weird. I'll look into that 🙁

I then tried setting backstage.postgresql.auth.password and postgresPassword instead of using upstream and it worked. I was no longer receiving the errors

Nice! Does it create additional Secret resource this way? (the default behavior is that it should always create a Secret unless existingSecret is used:

https://github.com/bitnami/charts/blob/ade6fc5fbc516534a28094843659fbf204ed1d57/bitnami/postgresql/templates/secrets.yaml#L26

https://github.com/bitnami/charts/blob/ade6fc5fbc516534a28094843659fbf204ed1d57/bitnami/postgresql/templates/_helpers.tpl#L164

Hence I would assume your setup creates 2 secrets with the same content, correct?

PatAKnight commented 1 year ago

Nice! Does it create additional Secret resource this way? (the default behavior is that it should always create a Secret unless existingSecret is used:

https://github.com/bitnami/charts/blob/ade6fc5fbc516534a28094843659fbf204ed1d57/bitnami/postgresql/templates/secrets.yaml#L26

https://github.com/bitnami/charts/blob/ade6fc5fbc516534a28094843659fbf204ed1d57/bitnami/postgresql/templates/_helpers.tpl#L164

Hence I would assume your setup creates 2 secrets with the same content, correct?

Yes, I was creating 2 secrets. But I think I can get away with just one. If we just set the password and postgresPassword, the PostgreSQL chart will create the secret using those values. I was overthinking the solution just a bit :laughing:.

The only downside is that I don't believe we will be able to allow users to define an existingSecret since it will still fail because of the dry-run. It might just have to be a limitation that we will have to document.

PatAKnight commented 1 year ago

I went ahead and created this PR that sets the password and postgresPassword fields. I am leaving this open in case there is any more discussion around the alias problem above.

sonarcloud[bot] commented 1 year ago

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
38.9% 38.9% Duplication

idea Catch issues before they fail your Quality Gate with our IDE extension sonarlint SonarLint