hashicorp / vault-secrets-operator

The Vault Secrets Operator (VSO) allows Pods to consume Vault secrets natively from Kubernetes Secrets.
https://hashicorp.com
Other
430 stars 89 forks source link

VDS fail with EntityAlreadyExists after a deployment restart #804

Closed sebglon closed 1 week ago

sebglon commented 4 weeks ago

Describe the bug After a deployment rolling upgrade from 0.6.0 to 0.7.1, the VDS fail tp reconcile due to an EntityAlreadyExists

To Reproduce Steps to reproduce the behavior:

  1. dpeloy a vds for AWS
  2. Upgrade VSO to 0.7.1
  3. check logs
  4. See error (vault-secrets-operator logs, application logs, etc.)
{"level":"error","ts":"2024-06-07T09:02:52Z","logger":"syncSecret","msg":"Vault request failed","controller":"vaultdynamicsecret","controllerGroup":"secrets.hashicorp.com","controllerKind":"VaultDynamicSecret","VaultDynamicSecret":{"name":"app-aws-account","namespace":"default"},"namespace":"default","name":"app-aws-account","reconcileID":"99454ef2-ec1c-41d2-8eb2-671c805d478a","path":"aws/creds/app-global","method":"GET","error":"Error making API request.\n\nURL: GET https://
vault.vault.svc.cluster.local:8200/v1/aws/creds/app-global\nCode: 400. Errors:\n\n* Error creating IAM user: EntityAlreadyExists: User with name vault-production-a-kubernetes-app-app-global--17 already exists.\n\tstatus code: 409, request id: 70cd92
00-4210-4c31-bb29-586a1d2d1fba"}

Expected behavior The AWS SA is not lost during a restart

Environment

benashz commented 3 weeks ago

Hi @sebglon, from the log output provided it looks like the vault request is failing. The error message appears to be coming from the AWS API. Would you mind sharing your Vault configuration the role, as well as the YAML form of the VDS custom resource.

sebglon commented 3 weeks ago

Our VDS exist since 200d and we have no error before the VSO upgrade;

apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultDynamicSecret
metadata:
  creationTimestamp: "2023-11-15T16:12:47Z"
  finalizers:
  - vaultdynamicsecret.secrets.hashicorp.com/finalizer                         g  n the link:
  generation: 2                                                                               http://127.0.0.1:60722/e51bbad8-836e-4f8e-b2e3-91760730c3c3
  name: minio-encrypted-aws-account
  namespace: default
  resourceVersion: "1238700271"
  uid: 18dfc336-6fd8-4aa7-a7f7-828c496e1a3e
spec:
  destination:
    create: true
    name: app-aws-account
    overwrite: false
    transformation: {}
  mount: aws
  path: creds/app-global
  renewalPercent: 67
  rolloutRestartTargets:
  - kind: Deployment
    name: app-encrypted-aws
  vaultAuthRef: app
status:
  lastGeneration: 2
  lastRenewalTime: 1717764893
  lastRuntimePodUID: 4b0c8732-8f57-4a49-8aa9-410329c1ee9e
  secretLease:
    duration: 86400
    id: aws/creds/app-global/kr07SEYhHYJ1CXwSuUz2ct0J
    renewable: true
    requestID: 47b99871-5ec0-6ccd-ef2e-108941adaf61
  staticCredsMetaData:
    lastVaultRotation: 0
    rotationPeriod: 0
    ttl: 0
  vaultClientMeta:
    cacheKey: kubernetes-86ecfdce4979417676d79d
    id: a8803b50c4421181de78d864134f342343b4240484038db220b6150d2fcbf0a0
sebglon commented 3 weeks ago

After deleting the SA on AWS and the generated secret on the K8S. The VDS work well.

sebglon commented 3 weeks ago

Every time we restart the master VSO pod, we have the error Error creating IAM user: EntityAlreadyExists: User with name vault-production-a-kubernetes-app-app-global--17 already exists

sebglon commented 1 week ago

We have the issue every 2 days that match with our lease_max:

vault read -tls-skip-verify aws/config/lease
Key          Value
---          -----
lease        24h0m0s
lease_max    48h0m0s

But we have 2 environments with the same config and only one has this issue...

Just after deleting the geneated secret, we have those event on the VDS:

  Type    Reason                   Age                   From                Message
  ----    ------                   ----                  ----                -------
  Normal  RolloutRestartTriggered  5m50s (x3 over 2d3h)  VaultDynamicSecret  Rollout restart triggered for {Deployment app-encrypted-aws}
  Normal  SecretRotated            5m50s                 VaultDynamicSecret  Secret synced, lease_id="aws/creds/app-global/2Ias6NDSAf1T6eOd1z6WNV9w", horizon=18h27m37.402596052s, sync_reason="InexistentDestination"
sebglon commented 1 week ago

The issue is related to the IAM user name that is too long and truncated by Vault. with the username-template