Secret object not renewed on change in SecretProviderClass

Dyllaann commented 3 years ago

Hi! I found the following issue, which I also listed in #321, but I think it's not explained properly. I'll try to describe it using a scenario below.

What steps did you take and what happened: Assume scenario 1 where an application with the following snippets are deployed: SecretProviderClass.yaml (partial)

secretObjects:
  - data:
    - key: mysecret1
      objectName: mysecret1
    secretName: secret-object
    type: Opaque

Deployment.yaml (Partial)

env:
- name: MYSECRET1_ENV_VAR
  valueFrom:
    secretKeyRef:
      name: secret-object
      key: mysecret1

The pod boots up, mounts the volume, the secret is created and the environment variable is linked from the secret. Now, we add a second secret into the SecretProviderClass

secretObjects:
  - data:
    - key: mysecret1
      objectName: mysecret1
    - key: mysecret2
      objectName: mysecret2
    secretName:secret-object
    type: Opaque

and mount it in the pod similarly to the first secret like

env:
- name: MYSECRET1_ENV_VAR
  valueFrom:
    secretKeyRef:
      name: secret-object
      key: mysecret1
- name: MYSECRET2_ENV_VAR
  valueFrom:
    secretKeyRef:
      name: secret-object
      key: mysecret2

and update the deployment.

Now, the pod will be refreshed because it's spec changed. The result of this is:

The pod will boot up and mount the CSI volume
The secrets are both correctly mounted under /mnt/secrets-store/mysecret1 and /mnt/secrets-store/mysecret2
The secret object secret-object is not renewed, thus still only having mysecret1
The environment variable mount for MYSECRET2_ENV_VAR fails, because mysecret2 is not in the object.

What did you expect to happen: I expected the secret to change. After the deployment update, the pod restarted and I expected the secret object secret-object to contain keys mysecret1 and mysecret2

Anything else you would like to add: The problem with this is that I don't need the reconciler feature itself. My secrets that need to be pulled from KeyVault are not updated periodically. I just need to chance which secrets are mounted to each pod. If I were to enable the reconciler, there is still the problem where my pod will be error looping untill the reconciler renewed the secret object. Besides that, I don't see the use case of enabling the reconciler with a refresh interval of 10 seconds. On the other hand, setting it to an hour could mean my pod doesn't start up for 59 minutes because it is unable to reference the secretKeyRef

Which provider are you using: Azure Key Vault

Environment:

Secrets Store CSI Driver version: (use the image tag): 0.0.16
Kubernetes version: (use kubectl version): 1.16.15

aramase commented 3 years ago

@Dyllaann Thank you for the detailed explanation. The behavior you're observing is currently expected.

The Secrets Store CSI Driver has 2 reconcilers -

That creates the initial Kubernetes secret when the pod is created. This reconciler never updates the secret if it already exists. This is because if the SecretProviderClass is updated and then a new pod is created referencing the SecretProviderClass, then it'll result in inconsistency in the values for pod1 and pod2.
The rotation reconciler which updates the mount and the Kubernetes secrets with the latest values. This will update the secret and mount with the latest values after the rotation interval.
- If the pod is restarted after the rotation takes place, the pods will be using the latest Kubernetes secret that contains the new keys added to SecretProviderClass.
- If the pod is restarted before rotation, then the first reconciler will check if secret exists and not update with the new keys.

Without the rotation feature, if you want to update the pod and Kubernetes secret, you can follow the instructions here - https://github.com/kubernetes-sigs/secrets-store-csi-driver/blob/master/docs/README.limitations.md#mounted-content-and-kubernetes-secret-not-updated-after-secret-is-updated-in-external-secrets-store. tl;dr If syncing as Kubernetes secret, delete the secret and then restart the pod. New secret will be created based on latest SecretProviderClass.

This distinction in reconcilers is to avoid inconsistency and conflicts for 2 reconcilers updating the objects over and over again.

Dyllaann commented 3 years ago

Thanks for clarifying that. I can understand there will be use cases where the inconsistency will cause problems. Unfortunately in my case, I use a SecretProviderClass for each individual application (not sure if this is the intended usage, but hey it works partially). If the SecretProviderclass backing a deployment/daemonset is changed, that means there probably are also changes to the pod itself and all of them will be replaced anyway.

If there is a deployment with two replica's, if pod1 is being renewed using the new SecretProviderClass, I do agree pod2 will expierience a short inconsistency from being created against an older version of the secret object. However, after pod1 is successfully renewed, I expect pod2 to follow.

In any case, I understand keeping support for the current behaviour is a requirement. Would it be possible to add a boolean to the SecretProviderClass that can indicate whether to clear the old secret upon a new volume mount when the current secret object no longer matches the secretObjects: from the SecretProviderClass?

Having to manually delete each secret is a blocker for us, as there could be the scenario where a change is made in a Helm chart that doesn't involve secrets and/or pod changes. If CI/CD deletes the secret before running helm upgrade, there will be no new secret object created because the pods don't restart. That'll only create problems further down the line.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

Dyllaann commented 3 years ago

/remove-lifecycle stale

dhananjaya94 commented 3 years ago

Having to manually delete each secret is a blocker for us, as there could be the scenario where a change is made in a Helm chart that doesn't involve secrets and/or pod changes. If CI/CD deletes the secret before running helm upgrade, there will be no new secret object created because the pods don't restart. That'll only create problems further down the line.

Ran in to the same issue, the secret had to be deleted manually when new object version is added.

dhananjaya94 commented 3 years ago

@aramase , could you explain how the rotation reconciler works when a new object version is set in the Azure Key Vault ? The secret storage class is applied to the cluster with the new object version. The new version of the secret is mounted to the pod but was not updated in the Kubernetes secret or in the pod environment variables. The reconciler value is the default value https://github.com/Azure/secrets-store-csi-driver-provider-azure/blob/04c1fae211b522a84a7818cac7b7daaef1ca9ef2/charts/csi-secrets-store-provider-azure/values.yaml#L110. Had to delete the secret and recreate the pod. Only after that the correct value is set to the env

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

endeepak commented 3 years ago

/remove-lifecycle stale

mmerickel commented 3 years ago

Not having each pod contain a consistent view of the external secret in a way that can be updated declaratively is a dealbreaker for this driver. We want to apply an updated SecretProviderClass and Deployment and when new pods are created the class should be ensuring those new pods are loaded with 1) any changes made to the external secrets 2) any changes made to the provider class.

At least the driver should support a force-refresh of the secrets when the SecretProviderClass is modified regardless of the normal refresh interval.

brucedvgw commented 3 years ago

Jumping in a little late. Has this been resolved? Hitting the same issue that the ENV secret is not rotated.

rlaveycal commented 3 years ago

Would giving the k8s secret a dynamic name (e.g. append chart version) be a work-around?. This way each deployment of a new version would result in a new secret, which therefore must be a new call to the vault.

mmerickel commented 3 years ago

@rlaveycal this was my stop-gap solution as well - haven't tried it yet to confirm it works but I suspect it will. In helm you can use .Release.Revision.

tyanko1 commented 3 years ago

@rlaveycal this was my stop-gap solution as well - haven't tried it yet to confirm it works but I suspect it will. In helm you can use .Release.Revision.

This works well for me with the downside being that Helm does not clean up the secrets created by previous releases resulting in a bunch of orphaned secrets lying around in k8s.

rlaveycal commented 3 years ago

the orphan k8s secrets are the responsibility of the driver not Helm. They have the label secrets-store.csi.k8s.io/managed=true on them but I can't see any logic that deletes them

mmerickel commented 3 years ago

I have since implemented that stop-gap solution and for us it has been properly cleaning up the secrets. I will say we are making a new SecretProviderClass and new secretName from it as the secretObjects[0].secretName both using {{ include "chart.fullname" . }}-{{ .Release.Revision}} - so far no orphaned secrets this way.

amarkevich commented 2 years ago

@aramase can be rotation reconciler configured with rotationPollInterval 0 to perform the secret update when volume mounted?

erjosito commented 2 years ago

@aramase any update on this? We are seeing this issue impacting many AKS users...

aramase commented 2 years ago

@aramase any update on this? We are seeing this issue impacting many AKS users...

@erjosito Could you provide more details on what the exact flow of events and the impact? Is secret rotation feature enabled? If yes, what is the rotation poll interval?

erjosito commented 2 years ago

@aramase my bad, we hadnt enabled secret rotation (shame on me!). We will have a look at it. Thanks for your answer!

ryuzakyl commented 2 years ago

I've tried mmerickel's solution ({{ .Values.myCustomName }}-{{ .Release.Revision }}) and it works for me (partially). Now the new secret is created and used properly in the deployment ;).

However, I do get the orphaned secrets.

$ kubectl get secret -l secrets-store.csi.k8s.io/managed=true

NAME                    TYPE     DATA   AGE
svc3-params             Opaque   21     3d
svc2-params             Opaque   14     7d15h
svc1-params             Opaque   27     7d15h
svc1-params-18          Opaque   28     26h
svc1-params-19          Opaque   28     26h
svc1-params-21          Opaque   28     26h
svc1-params-23          Opaque   28     25h
svc1-params-24          Opaque   28     3h52m

Any ideas on why is that happening or how to troubleshoot this issue?

markandersontrocme commented 2 years ago

I have the same issues as @ryuzakyl with orphaned secrets.

@mmerickel or anyone else have a solution for this? It means every time a we upgrade our release a new secret is made therefore a new orphaned secret is made...

mmerickel commented 2 years ago

I did have some orphan secrets. It does clean up some of the time.

Unfortunately this repo has really been not seeing much support from aws so I switched to external-secrets and stakater’s reloader and have been happy with it. What pushed me over the edge was that external-secrets works well with irsa, has great template support, and the reload support in the csi driver is global (and alpha) instead of per secret.

The csi driver has some nice features like making the pod wait while secrets get mounted and the ability to not use k8s secrets at all if you want but it just isn’t getting enough love lately.

ArgoCD also has poor support for .Release.Revision in helm which triggered me to look for a solution that was release agnostic as well.

markandersontrocme commented 2 years ago

@aramase Are there any plans to better support this? I tried using rotation, while it works well, it also easily incures extra charges as I would want to set the pollingInterval low (1m or so)

Using the -{{ .Release.Revision }} in the SecretProviderClass and Secret name works well, however it leaves orphaned secrets. So having a mechanism be able to clean those up would be ideal.

Maybe we can use SecretProviderClassPodStatus to find unused secrets?

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

markandersontrocme commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 1 year ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

vl-shopback commented 1 year ago

/remove-lifecycle stale

Preen commented 1 year ago

It's a shame that updating the SecretProviderClass doesn't update the secrets unless you have a rotation setup.

giordanocardillo commented 1 year ago

/remove-lifecycle stale

marcio-ch commented 1 year ago

Since 2020 this design choice was not reviewed. @Dyllaann made a very reasonable suggestion to add a flag to control the behavior (pod1 & pod2 inconsistency vs External Secret & K8s Secret inconsistency), but the maintainer simply ignored it.

What a shame, this would be a great driver if that "feature" was reviewed.

aramase commented 1 year ago

Since 2020 this design choice was not reviewed. @Dyllaann made a very reasonable suggestion to add a flag to control the behavior (pod1 & pod2 inconsistency vs External Secret & K8s Secret inconsistency), but the maintainer simply ignored it.

What a shame, this would be a great driver if that "feature" was reviewed.

@marcio-ch Appreciate you taking the time to comment on the issue. Adding the flags to control the behavior just leads to more levels of inconsistency between existing pods that's already running and data in new pods for the exact same SecretProviderClass.

Just want to let everyone know, this is not being ignored. The driver can't be forced to react on changes to SecretProviderClass because the primary purpose of the driver is to be invoked by kubelet at the time of volume mount. The sync as Kubernetes secret feature is not the right fit along with the default mount and we (all the maintainers) in the community have been discussing splitting the sync as Kubernetes controller as a separate project. Once split, the driver will be used only for volume mount and the separate controller will handle creation of Kubernetes secret in addition to reacting to changes in SecretProviderClass. Hope this helps clarify what we're doing! If you're interested in more info, you can also join our bi-weekly community calls to see the project direction and features we are working on.

marcio-ch commented 1 year ago

Thanks for the reply. Just to give more color on what I said, I can see that, in the face of a decision, you took the option that you thought it was better, I just disagree with it.

What you did was to disregard the external secret as the source of truth of the secret value for the pods: If the value is updated in the external secret but that action doesn't reflect in the k8s secret / volume mount, then the k8s secret / volume mount is the de facto source of truth for the pods, independently from the external secret updates, until it is manually changed. That's is not intuitive.

What you call "inconsistency" between the pods created before and after the secret change is actually the normal and, I would say, expected behavior. To avoid that "inconsistency" you stopped addressing the most common use case for external secrets that is "if my external secret is updated, every pod created after this update should get the updated value".

In other words, to avoid the inconsistency between the pods created before and after the secret change, you created a worse (in my opinion) inconsistency between the external secret and the k8s secret / volume mount.

I was able to address this problem by using the current community workaround that is to enable the secret rotation polling strategy, which works ok. By the number of users of this driver using the same workaround, you can see that a very important, to say the least, use case is not being addressed by default.

To not only sound bitter, the driver works great when correctly configured. Thanks for the work you did here, together with Reloader it allowed me to greatly improve our secret management at my company. 👍

pavan359t commented 10 months ago

@marcio-ch

would you please provide me the example with Reloader and SecretProviderClass of csi driver

k8s-triage-robot commented 7 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 6 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 5 months ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 5 months ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes-sigs/secrets-store-csi-driver/issues/389#issuecomment-2084905502): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / secrets-store-csi-driver

Secret object not renewed on change in SecretProviderClass #389