open-feature / open-feature-operator

A Kubernetes feature flag operator
https://openfeature.dev
Apache License 2.0
181 stars 35 forks source link

Sidecar is not injected during Kubernetes Cluster startup #654

Closed udsprasad closed 3 months ago

udsprasad commented 4 months ago

Hi Team,

We're excited about integrating the open feature into our current project. Every day, we restart our whole Kubernetes cluster. However, during startup, the open feature operator fails to inject the flagD sidecar into the respective components. Our assumption was that all services would come up simultaneously during startup. It seems that the open feature isn't healthy enough to inject the sidecar. Do you have any ideas on how to fix this?

toddbaert commented 4 months ago

The operator will ensure any deployments which are annotated correctly will have flagd injected, but the operator must be healthy before this is true. Until the operator is healthy, none of it's admission webhooks can work as expected.

I'm not sure exactly how you are restarting your cluster, but I suppose I would recommend you alter whatever script or automation is doing that to wait for OFO to be ready before you start deploying "normal" workloads. Something like:

kubectl wait --timeout=60s --for condition=Available=True deploy --all -n 'open-feature-operator-system'

This waits up to 60s for the components in the OFO namespace ('open-feature-operator-system' by default) to start up.

Maybe this helps?

dabump commented 4 months ago

@toddbaert - Thanks for your quick reply! Just to add context for restart reasoning. The k8s cluster for non production usage will be scaled down daily to save aws ec2 costs.

When scaled back to operation in the next day, we are noticing that the sidecar is not injected, and this is due to the operator being unhealthy at the time of pod creation.

So we thought best to reaching out to the community to hear of possible strategies / workarounds.

Oro commented 3 months ago

Very much a drive-by-comment in the hope that this helps. I have not tested it, but I ran into a similar situation a few years ago with routine shutdown of a full cluster but for a different operator:

You'd need to change the mutatingWebhookConfiguration which injects the sidecar to something like the following and add the respective label (in addition to the annotations) to pods that you wish the sidecar to spawn on.

apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
  annotations:
    cert-manager.io/inject-ca-from: open-feature-operator/open-feature-operator-serving-cert
  labels:
    app.kubernetes.io/instance: flagd
  name: open-feature-operator-mutating-webhook-configuration
webhooks:
  - admissionReviewVersions:
      - v1
    clientConfig:
      service:
        name: open-feature-operator-webhook-service
        namespace: open-feature-operator
        path: /mutate-v1-pod
    failurePolicy: Fail # Do not spawn a new pod if the webhook is unavailable
    objectSelector: # Only trigger the webhook on objects with this label. Without that, the flagd operator itself would not be able to spawn
      matchLabels:
        foo: bar
    name: mutate.openfeature.dev
    rules:
      - apiGroups:
          - ''
        apiVersions:
          - v1
        operations:
          - CREATE
          - UPDATE
        resources:
          - pods
    sideEffects: NoneOnDryRun

NB: I do not think this is currently configurable in the helm chart and, again, I have not tested that in any way. Hope it helps.

udsprasad commented 3 months ago

Thanks @Oro for your reply. Hope it will helps us. I will give a try on it

udsprasad commented 3 months ago

Once again Thanks @Oro. Your solution is working fine.