spiffe / spire-controller-manager

Kubernetes controller manager that reconciles workload registration and federation relationships.
Apache License 2.0
45 stars 35 forks source link

Gradual rollout / multicluster support #312

Open StupidScience opened 4 months ago

StupidScience commented 4 months ago

Hello.

We're at the moment considering migration from deprecated k8s-registrar to spire-controller-manager and we're facing few challenges:

Context

We have multi-cluster setup for trust domain. So in each cluster we have spire installation with servers, agents and k8s-registrar in reconciler mode. Spire servers have shared database tho. In reconciler mode basically each clusters' k8s-registrar is responsible for its own cluster and do not touch entries that belong to another cluster.

Once we install spire-controller-manager even in one cluster it immediately removes all registered by k8s-registrar (or in any other way) entries. So k8s-registrar lose its permissions to register smth, all registrered entries for all k8s clusters along with all static entries are gone and all federation entries defined in spire server config also being removed without visible attempts to recover. If we would install controller in few cluster I imagine they would constantly remove each other entries.

I briefly looked into code and it seems to be expected behaviour, so controller becomes the only source of truth for all entries.

What we would like to add:

Dry-run mode

So controller would only print out what it is going to do instead of doing updating/deleting/etc.

Ownership mechanism

So controller would only look into entries that it is owner for. It would help for both multi-cluster setup and gradual migration.

Possible solution for ownership

Controller manager can add some metadata to entries' Hints, e.g. Hint: owner=cluster-1. In this case controller manager for cluster-1 will touch only records that it is owner for and skip all others:

By default this ownership could be disabled so breaking change won't introduced and other flags could be added to get an ownership over objects if required.

I believe external-dns uses somewhat similar mechanism with ownership via TXT records.

It is not clear for me what to do better for out of k8s "static" entries and federations so would appreciate your input.

Let me know if you want me to split this issue into multiple ones.

riuvshyn commented 4 months ago

static entries should be fine actually as in both clusters we can have identical ClusterStaticEntry resources and both controllers can reconcile it feels like the only problem with k8s workloads which are k8s cluster specific.

StupidScience commented 4 months ago

@kfox1111 you mentioned some hint based filtering in https://github.com/spiffe/spire/issues/4898. Is it WIP/PoC somewhere or I misinterpreted it or just didn’t find in this repo?

kfox1111 commented 4 months ago

Something I tried just on my own box.

kfox1111 commented 3 months ago

@StupidScience Have a look at https://github.com/spiffe/spire-controller-manager/pull/325

StupidScience commented 3 months ago

@kfox1111 thanks, I checked and conceptually (didn't look thoroughly through the code) it should work for our use case at least. How will that behave with ClusterStaticEntry resources tho? Or is it only for ClusterSPIFFEID?

Will try to elaborate a bit:

In my understanding ClusterStaticEntry resources are not cluster specific actually but rather trust domain specific. So in this case will each controller try to create their own entry? Will that actually work?

kfox1111 commented 3 months ago

It should work for static entries as well.

The change only filters what the controller manager looks at when reconciling entries. Agents use the full set.

Multiple controller managers may still might fight if the unique spiffeid/selectors/parentid are the same across multiple clusters. But otherwise, should work I think.