Open edison-vflow opened 2 months ago
cc @phbelitz @chrysogonus
@phbelitz @chrysogonus , another way this could be fixed with less effort is to allow the Connaisseur deployment to add custom annotations from the values.yam
l file.
Currently the annotations that can be added on a deployment are restricted
If we can allow custom annotations here
Then we can use something like https://github.com/stakater/Reloader where we add the annotation
reloader.stakater.com/auto: "true"
I see that the ecr-credentials secret ends up being included in the deployment as a volume mount, which makes it a perfect candidate for Reloader to restart the deployment
volumes:
- name: certs
secret:
secretName: connaisseur-tls
- name: redis-certs
secret:
secretName: connaisseur-redis-tls
- name: app-config
configMap:
name: connaisseur-app-config
- name: connaisseur-alert-config
configMap:
name: connaisseur-alert-config
- name: ecr-credentials-volume
secret:
secretName: ecr-credentials
I have played around with this idea further by manually editing the Connaisseur deployment to add Reloader annotations
ecr-credentials
.Reloader on the other hand can actually do a rollout to a deployment when either a configmap or a secret its using are changed.This does not include when the secret is destroyed and recreated ecr-credentials
secret the way https://github.com/nabsul/k8s-ecr-login-renew does it, we actually modify the value of the existing secret.This worked perfectly and the Connaisseur deployment was restarted, able to refresh the token from the modified secret So this is to just share ideas on one way token refresh could be achieved when using the auth.secretName
validators mechanism
@edison-vflow yikes, thank you for that very thorough digging into the problem. if i understand correctly this whole ordeal needn't to be done when using the useKeychain
feature (which as already correctly reported here https://github.com/sse-secure-systems/connaisseur/issues/1766, is wrongly documented), am i right?
Hi @phbelitz This issue is independent of https://github.com/sse-secure-systems/connaisseur/issues/1766
Connaisseur supports multiple mechanisms to login to your repositories when it is doing cosign validation
We tried 2 of them:
k8s_keychain
https://sse-secure-systems.github.io/connaisseur/v3.6.1/validators/sigstore_cosign/#k8s_keychaindockerconfigjson
https://sse-secure-systems.github.io/connaisseur/v3.6.1/validators/sigstore_cosign/#dockerconfigjsonThe issue raised here is for 2. Using dockerconfigjson
When using the dockerconfigjson
mechanism where you specify auth.secretName
as your authentication mechanism to the docker image repositories, we picked up that Connaisseur reads correctly the authentication details from the secret that you give it via auth.secretName
Connaisseur then keeps the authentication details it read from the secret in memory for as long as its pods are running. However, the credentials of most docker repositories expire after 6 to 12 hrs. When the credentials expire, Connaisseur starts failing validation because it has the stale credentials. Connaisseur has no mechanism to auto refresh the credentials loaded in the secret it is referencing, thats why we start having validation failures after token expiration.
The interesting thing is that, on our side, we have a mechanism where we fetch new credentials before they expire, so that the secret is repopulated. So when Connaisseur starts failing validation, due to expired credentials, if we manually restart the Connaisseur deployment, it is able to start working again because it gets the updated values from the secret.
In Kubernetes clusters, this is a common issue that a deployment may need to restart when its associated configMaps or secrets are updated.
The issue that Connaisseur is not able to pick up the refreshed tokens could be resolved if Connaisseur Kubernetes deployment allowed modifications to its annotations, so that tools like reloader can restart Connaisseur when its associated secrets are changed.
Connaisseur Kubernetes annotations are not modifiable according this code connaisseur/charts/connaisseur/templates/deployment.yaml
We should be able to allow custom annotations coming in from values.yaml
, see my comments in first messages
--
Issue 1. Using k8s_keychain
https://sse-secure-systems.github.io/connaisseur/v3.6.1/validators/sigstore_cosign/#k8s_keychain is a separate issue whereby when you are using the keychain authentication mechanism, the code implementation is different from the documentation and we need the docs to be updated because that capability is to use k8s_keychain is now broken, if you follow the public documentation guidelines
@edison-vflow ok, I see three options:
useKeychain
), which takes a lot of codeing and time. don't like it.so i guess i'll prepare a PR with customizable annotations.
@edison-vflow ok, I see three options:
- implement the acquisition of renewed tokens inside Connaisseur (which in parts is already implemented by
useKeychain
), which takes a lot of codeing and time. don't like it.- let Connaisseur read k8s secrets, instead of mounting them. this way Connaisseur will always pick up the newest version of a secret and not remount it after a restart. but this would also mean Connaisseur needs more permissions (ability to read secrets in Connaisseur namespace through API) and this doesn't feel like the way kubernetes intended things to work. so ideally i wanna pass on that too.
- your annotation option. i have zero issues with adding customizable annotation to the connaisseur deployment. we could have done that earlier, the use case for that just never popped up (until now).
so i guess i'll prepare a PR with customizable annotations.
@phbelitz , first prize is really to have Connaisseur be self contained. My thinking was that we can expose the customizable annotations as a temporary fix.
We explored this route of customizable annotations when auth.keychain
was broken.
So we thought to ourselves that, at least auth.secretName
is partially working.
How can we retrofit the partially working solution to work fully, given that we have minimal access to the codebase.
Thats when the idea of using tools like reloader that read a deployment's annotation to restart a deployment when its secrets or configmaps change came in.
It was as a workaround to give us full functionality given that we were now stuck, if we wanted to use the latest version as all auth mechanisms we had evaluated were having issues.
I propose that we can enable customizable annotations as a temporary solution to help people who go the auth.secretName
route.
Otherwise we should go all the way and allow auth.secretName
mechanism to refresh tokens by itself.
If not, it is better if we do not offer it as a authentication alternative because the complexities in troubleshooting the issues are very high.
We will end up with people silently quitting and not using the product if we offer an option that requires other intricacies for it to fully work
Describe the bug When using Connaisseur application version 3.6.1 and chart version 2.6.1 on EKS v1.30, using cosign validators where auth.secretName is used and ECR is the image registry, Connaisseur can validate images correctly and after about 6 hrs, validation starts failing with
The validator section is defined as
The issue happens for the
awsvalidator
that needs ECR credentials provided via the secretecr-credentials
On initial run , validation works for about 6 hrs.This time sometimes varies.After the 6 or so hours, we start getting the error highlighted above.
At the moment that validation starts failing, various operations in the cluster are blocked, like rollout of deployments.
What we notice is that if we restart Connaisseur, then validation starts working again, until the next expiration.
We have a cronjob that runs every 6 hrs, this is to cater for the fact that the ECR token expires after 12 hrs. This refresh ecr cronjob refreshes the ecr-credentials secret that Connaisseur validator is using. For refreshing the token every 6 hrs before expiration, we are using https://github.com/nabsul/k8s-ecr-login-renew The refreshing seems to be working, as the restart of Connaisseur always works and the restart will be using this refreshed token.
Its looking like the Connaisseur validator that uses the auth.secret mechanisms reads the token in once at startup but does not have a way of reading the token when it is refreshed in the ecr-credentials secret, the same secret it is reading from at start-up.
Would this explain why a restart of Connaisseur seems to always fix the issue ?
Another test we did is to explicitly run the token renewal job at the time Connaisseur validation fails to force token refresh. The credentials are renewed but they are not picked up by a running instance of Connaisseur
Could you give guidance on how best to solve this issue or perhaps what other clients that are using the auth.secret for cosign are doing to always have Connaisseur use the latest token
Expected behavior
Optional: To reproduce