open-feature / open-feature-operator

A Kubernetes feature flag operator
https://openfeature.dev
Apache License 2.0
179 stars 34 forks source link

Alternative FlagD Deployment Model #250

Closed beeme1mr closed 1 year ago

beeme1mr commented 1 year ago

Problem

The OpenFeature Operator (OFO) currently supports the deployment of FlagD as a sidecar. FlagD watches the Kubernetes API for changes to the FeatureFlagConfiguration Custom Resource and updates the configuration in near real-time. This works great in environments that allow workload pods to access the Kubernetes API directly. However, in more restricted environments where workloads are not allowed to access the K8s API, this presents security challenges. A work-around could be to fall back to the "ConfigMap mounted as a volume" approach, but the reconciliation time of up to two minutes is too slow.

Proposal

Determine and implement an alternative deployment model for FlagD. The following is a non-exhaustive list of potential options.

DaemonSet FlagD Proxy

A DaemonSet could be installed on all nodes that have at least one instance of FlagD. This flagD would be specifically configured not to evaluate flags but to serve a representation of the relevant custom resources to the sidecar flagD instances. In other words, this DaemonSet would act as a logical proxy to the Kubernetes API and all instances of the FlagD sidecars on that node would be configured to use the proxy.

Image

DaemonSet FlagD

A DaemonSet running a FlagD instance on each node. Pods would no longer have a instance of FlagD and providers could communicate directly with the Daemon.

Image

FlagD Proxy Service

A FlagD proxy service could be used. This service would have direct access with the Kubernetes API and instances of the FlagD sidecar would connect to the proxy.

Image

FlagD Service

A FlagD Service could be used. This service would be an unmodified version of FlagD that providers could connect to directly.

Image

Requirements

AlexsJones commented 1 year ago

Before analysing too much the various propositions, I think it's important to ask ourselves whether this is something we think should be solved by OpenFeature operator+flagD. This sets a precedent for use cases that are presented that solve a unique challenge but perhaps don't improve the quality of the overall design, in fact making things more complex for other adopters.

A similar case could be made for custom CNI requirements and we need to, as an organisation, decide where we think we need to implement to increase adoption/longevity and quality of the project.

With that said, flagD by design is a component outside of OFO, therefor it is possible to embed it in another operator or pattern of your choice. Moving to a daemonset for example, brings its own challenges around rolling out updates and the design decisions that would need to be considered using a *:1 flagD implementation.

I am going to think on this more, but wanted to share my opening thoughts.

beeme1mr commented 1 year ago

In my opinion, the current architecture should remain the default. However, it appears to be relatively common that workloads are not able to access the Kubernetes API directly. In those environments, an alternate deployment model would be a requirement for adoption.

As you mentioned, all the proposed alternative architectures present different pros and cons. There's also very likely additional options that I haven't considered. This issue was intended to present the problem statement and I'm hopeful we can find a solution that prevents adding too much additional complexity.

toddbaert commented 1 year ago

At Kubecon, I remember a stranger recommending we use the operator itself as a logical proxy for flagd instances to listen to, instead of the API server. Perhaps that's an option too, though I haven't given it too much thought.

AlexsJones commented 1 year ago

My conclusion from this current conversation is that there are some scenarios where a company might want to isolate access to the API server from specific pods.

I think it's reasonable that in those scenarios they would expect pods that require the API server to do so by some other means.

The challenge with the current suggestions for me is there is no "one size fits all" approach. It's largely contextual based on your organisation's deployment, policies and cluster setup.

Probably the simplest solution would be to let the kubernetes endpoint in flagd be configurable from open-feature-operator via the FeatureFlagConfiguration or some other CRD. That way, regardless of what or where the API server was, flagD doesn't care - ( of course you'd have to provide a secret ref too for the pem file ).

image

beeme1mr commented 1 year ago

Yeah, I agree. We should be able to achieve this in a more elegant way with the gRPC sync. I'm going to close this issue since it seems clear that the original proposal was the wrong approach.