open-feature / open-feature-operator

A Kubernetes feature flag operator
https://openfeature.dev
Apache License 2.0
198 stars 38 forks source link

Flagd sync breaks after chart update #292

Closed beeme1mr closed 1 year ago

beeme1mr commented 1 year ago

Overview

Updating the OFO via helm causes existing flagd subscriptions to stop working. This seems to be related to the service account used by flagd to subscribe to the Kubernetes API.

Steps to reproduce

  1. Deploy OFO using helm version 0.2.22
  2. Deploy a workload that requires flagd to be injected by OFO
  3. Test that flag updates are synced
  4. Update OFO to version 0.2.23 using helm
  5. Attempt to update a flag configure
  6. Confirm that flagd did not receive the flag update
  7. Recreate the workload
  8. Confirm that the new flagd is receiving flag updates

Logs

Expand this section to see the logs output by flagd. ``` ______ __ ________ _______ ______ /_____/\ /_/\ /_______/\ /______/\ /_____/\ \::::_\/_\:\ \ \::: _ \ \\::::__\/__\:::_ \ \ \:\/___/\\:\ \ \::(_) \ \\:\ /____/\\:\ \ \ \ \:::._\/ \:\ \____\:: __ \ \\:\\_ _\/ \:\ \ \ \ \:\ \ \:\/___/\\:.\ \ \ \\:\_\ \ \ \:\/.:| | \_\/ \_____\/ \__\/\__\/ \_____\/ \____/_/ {"level":"warn","ts":1673279286.986885,"caller":"kubernetes/kubernetes_sync.go:57","msg":"Client not initialised","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279286.988456,"caller":"kubernetes/kubernetes_sync.go:104","msg":"Starting kubernetes sync notifier for resource dev/end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279287.0330749,"caller":"service/connect_service.go:109","msg":"metrics listening at 8014","component":"service"} {"level":"info","ts":1673279287.1486142,"caller":"kubernetes/kubernetes_sync.go:136","msg":"kube sync notifier event: add dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279287.150566,"caller":"kubernetes/kubernetes_sync.go:136","msg":"kube sync notifier event: add dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279287.160034,"caller":"runtime/runtime.go:84","msg":"configuration change (write) for flagKey hex-color (dev/end-to-end)","component":"runtime"} {"level":"info","ts":1673279347.1486418,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279347.1491268,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279353.633195,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279353.6428063,"caller":"runtime/runtime.go:84","msg":"configuration change (update) for flagKey hex-color (dev/end-to-end)","component":"runtime"} {"level":"info","ts":1673279407.1487696,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279407.1490867,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279443.8719244,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"error","ts":1673279443.8738065,"caller":"runtime/runtime.go:61","msg":"fetch: featureflagconfigurations.core.openfeature.dev \"end-to-end\" is forbidden: User \"system:serviceaccount:dev:default\" cannot get resource \"featureflagconfigurations\" in API group \"core.openfeature.dev\" in the namespace \"dev\"","component":"runtime","stacktrace":"github.com/open-feature/flagd/pkg/runtime.(*Runtime).startSyncer\n\t/workspace/pkg/runtime/runtime.go:61\ngithub.com/open-feature/flagd/pkg/runtime.(*Runtime).Start.func1\n\t/workspace/pkg/runtime/start.go:31\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220929204114-8fcdb60fdcc0/errgroup/errgroup.go:75"} {"level":"info","ts":1673279467.1476731,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279467.1482906,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279527.148567,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279527.1488688,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279587.1494663,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279587.149673,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279647.1501207,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279647.150266,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279707.1502128,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} {"level":"info","ts":1673279707.1507156,"caller":"kubernetes/kubernetes_sync.go:142","msg":"kube sync notifier event: update dev end-to-end","component":"sync","sync":"kubernetes"} E0109 15:55:39.151978 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: unknown W0109 15:55:40.274607 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope E0109 15:55:40.274644 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope W0109 15:55:43.169613 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope E0109 15:55:43.169648 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope W0109 15:55:48.773706 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope E0109 15:55:48.773739 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope W0109 15:56:01.272456 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope E0109 15:56:01.272487 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope W0109 15:56:16.079723 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope E0109 15:56:16.079764 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope W0109 15:56:45.909843 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope E0109 15:56:45.909875 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope W0109 15:57:32.709040 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope E0109 15:57:32.709074 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope W0109 15:58:16.944866 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope E0109 15:58:16.944903 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope W0109 15:58:50.831425 1 reflector.go:424] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope E0109 15:58:50.831508 1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.4/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: failed to list *unstructured.Unstructured: featureflagconfigurations.core.openfeature.dev is forbidden: User "system:serviceaccount:dev:default" cannot list resource "featureflagconfigurations" in API group "core.openfeature.dev" at the cluster scope ```
james-milligan commented 1 year ago

Ive investigated this bug and have found the cause:

Helm now uses a 3 way strategic merge patch (instead of the 2 way merge seen in helm 2), this means that the current state of resources are taken into account when actions such as rollback or upgrade take place. As such, any external changes to the deployed resources will have their state set back to the 'default' rather than these changes persisting. e.g.

helm install myapp ./myapp (containing a deployment with 3 replicas)
kubectl scale --replicas=0 deployment/myapp
helm rollback myapp

results in the deployment of myapp having 3 replicas.

This behaviour breaks the logic used within the pod_webhook. When a pod requests access to the FeatureFlagConfiguration, its service account is added to the flagd-kubernetes-sync cluster role binding, granting access to the get, watch and list verbs for the core.openfeature.dev API resources. This is a requirement for the Kubernetes sync. Over time the cluster role binding's subject array grows as service accounts are added.

If we then run a helm upgrade, this mutated state of the flagd-kubernetes-sync cluster role binding is rolled back to the new desired state held within the chart, scrubbing the permissions for any previously authorised pods, with the only method of 're-adding' the permissions being a pod restart.

There are a few options for solving this issue, and I am certain that this list is not exhaustive:

cc @AlexsJones

Kavindu-Dodan commented 1 year ago

I assume Helm uses a merge strategy similar to what is defined by K8s. For example merging lists [1]. And it says "If no patchStrategy is specified for a field of type list, then the list is replaced." Which could be the root cause.

From the suggested solutions, it is best to avoid wildcards due to security concerns and implement a sync mechanism in the operator.

[1] - https://kubernetes.io/docs/tasks/manage-kubernetes-objects/declarative-config/#merging-changes-for-fields-of-type-list

Kavindu-Dodan commented 1 year ago

Another option is to manage cluster role bindings through the operator itself, making it independent from the helm layer. However, this has the potential to pollute clusters during the uninstallation of OFO and require complex logic to handle all scenarios.