ratify-project / ratify

Artifact Ratification Framework
https://ratify.dev
Apache License 2.0
189 stars 56 forks source link

No retry mechanism the cert fetch #1131

Open duffney opened 8 months ago

duffney commented 8 months ago

What happened in your environment?

I installed Ratify on my AKS cluster before the appropriate access policy was assigned to the managed identity used by Ratify to connect to an Azure Key Vault instance. Once I noticed the issue, I assigned the appropriate access policy to the identity, but without a retry for the certificate fetch I was forced to uninstall and reinstall Ratify on the cluster. @akashsinghal mentioned it's also possible to delete the certstore-akv certificatestore to force a new fetch to occur.

What did you expect to happen?

I expected that Ratify would retry the fetch and resolve the cert issues.

What version of Kubernetes are you running?

1.26.3

What version of Ratify are you running?

v1.0.0

Anything else you would like to add?

No response

Are you willing to submit PRs to contribute to this bug fix?

Steps to reproduce the error

  1. Clone https://github.com/duffney/secure-supply-chain-on-aks
  2. Run cd terraform, terraform init && terraform apply --auto-approve
  3. Set environment vars from terraform output
terraform init && terraform apply --auto-approve;
export GROUP_NAME="$(terraform output -raw rg_name)"
export AKS_NAME="$(terraform output -raw aks_name)"
export VAULT_URI="$(terraform output -raw akv_uri)"
export KEYVAULT_NAME="$(terraform output -raw akv_name)"
export ACR_NAME="$(terraform output -raw acr_name)"
export CERT_NAME="$(terraform output -raw cert_name)"
export TENANT_ID="$(terraform output -raw tenant_id)"
export CLIENT_ID="$(terraform output -raw wl_client_id)"
  1. Get AKS creds az aks get-credentials --resource-group ${GROUP_NAME} --name ${AKS_NAME}
  2. Deploy Gatekeeper
helm repo add gatekeeper https://open-policy-agent.github.io/gatekeeper/charts

helm install gatekeeper/gatekeeper  \
--name-template=gatekeeper \
--namespace gatekeeper-system --create-namespace \
--set enableExternalData=true \
--set validatingWebhookTimeoutSeconds=5 \
--set mutatingWebhookTimeoutSeconds=2
  1. Deploy Ratify
helm repo add ratify https://deislabs.github.io/ratify

helm install ratify \
    ratify/ratify --atomic \
    --namespace gatekeeper-system \
    --set akvCertConfig.enabled=true \
    --set featureFlags.RATIFY_CERT_ROTATION=true \
    --set akvCertConfig.vaultURI=${VAULT_URI} \
    --set akvCertConfig.cert1Name=${CERT_NAME} \
    --set akvCertConfig.tenantId=${TENANT_ID} \
    --set oras.authProviders.azureWorkloadIdentityEnabled=true \
    --set azureWorkloadIdentity.clientId=${CLIENT_ID}
  1. Deploy template and constraint
kubectl apply -f  manifests/template.yaml
kubectl apply -f  manifests/constraint.yaml
  1. Deploy the manifests
kubectl apply -f /manifests
  1. Describe certificatestore
kubectl describe certificatestore certstore-akv --namespace gatekeeper-system
akashsinghal commented 8 months ago

cc: @susanshi

susanshi commented 8 months ago

Hi @duffney Josh, The certificate store reconcile is only triggered when the CR is modified. In the case of permission change, since the action is external to ratify, customer would need to manually trigger a fetch operation by deleting and applying the CR again. I will add a TSG for this work around , thank you for the submitting this feedback.

It is also an option to implement to scheduled sync, we will need to discuss if this setting should be configurable, and the if certificate should be evited if fetch operation fails.

susanshi commented 8 months ago

TSG PR submitted for review :https://github.com/deislabs/ratify-web/pull/28

yizha1 commented 3 months ago

@susanshi @akashsinghal I would like to discuss this issue in the community meeting this week.

susanshi commented 1 month ago

Hi @duffney , the secrets store csi driver supports auto rotation, we could see if there are anything we could reference from their design.