tuenti / secrets-manager

A daemon to sync Vault secrets to Kubernetes secrets
Apache License 2.0
171 stars 26 forks source link

Re-write watchNamespaces/excludeNamespaces to use MultiNamespacedCacheBuilder #59

Closed avenging closed 4 years ago

avenging commented 4 years ago

I would like to propose a re-write of the code for restricting namespaces that secrets-manager manages secretdefinitions in. My aims are:

  1. Remove the need for secrets-manager to have a clusterrole to read secretdefinitions at the cluster scope.
  2. Allow secrets-manager to be installed on a per namespace basis or manage only set namespaces without the need for a cluster role
  3. Enhance the ability for secrets-manager to be installed in a mult-tennant cluster.

I suggest changing the manager to use cache.MultiNamespacedCacheBuilder when a namespace restrictions have been configured. This way the cache for the manager will only contain information about namespaces that are set in the watchList. This will remove the need for a clusterrole to read all secretdefinitions and secrets-manager will only need permissions to secretdefinitions in the namespaces for which it is configured.

When no namespace restrictions are configured then the manager will continue to manage all namespaces in the cluster and secrets-manager will continue to need a clusterrole.

I would like to propose the removal of the excludeNamespaces option as this would still (in my opinion) require a clusterrole that allows the listing of all namespaces in the cluster to allow logic to work out the list of namespaces to watch instead of the current check in each reconcile.

Remove all code relating to checking for a watched namespace from the controller and controller creation as it will no longer be needed as it will not know anything about secretdefinitions in namespaces not defined in the manager.

To this end I have written some code for review that achieves the goals I am looking for that might be useful. At the moment it still takes account of excludeNamespaces, but as I say I would prefer to remove this code.

I have tried to write tests for this but the new code paths require starting and stopping the manager with different configurations. To me this means starting and stopping test environments throughout testing which I had significant issues with and cannot resolve. Manual testing shows the code to work though.

fcgravalos commented 4 years ago

Hey @avenging!

While I like the idea, to me having a clusterrole to watch the object that the controller is supposed to be watching, it's not a big a deal.

secrets-manager it's installed in his own namespace and in multi-tenant clusters at Tuenti, though I agree that requires a cluster scope. On the other hand if you use the multinamespaced cache builder, you still need a clusterrole to list all namespaces, right? I may not understand how do you plan to use this though.

Testing is something that we take seriously, so while I'm happy you are looking into this, we need to find a way to make this code testable. I know that using caches complicates this.

Let's keep talking about this.

cc @dannyk81

avenging commented 4 years ago

Using the multinamespaced cache builder doesn't require a cluster scope to list namespaces, it only requires permissions for secrets/secretdefinitions on the namespaces you want to monitor.

I'm not sure I was clear enough about the concerns that we have in using a single secrets manager in the cluster. Having a single instance means that in the Vault backend it needs permissions to every possible secret you may want to sync. It means that any user in any namespace can create a secredefinition for any secret in Vault that the approle has access to. We don't think is ideal (unless we are missing something.....)

The cluster scoped permission for list namespaces is so the code has the ability to use the excludeNamespaces setting by listing all namespaces, then removing all the excluded namespaces to get the final list. Hence I kind of want to remove excludeNamespaces so that is no longer needed and then that code can disappear along with the cluster role.

I will take a look at the test code I tired to write and paste some here to see if we can solve the testing issues I was having.