hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.79k stars 4.17k forks source link

Vault Agent Injector not injecting secrets into pods running on EKS cluster #27788

Open dancook993 opened 1 month ago

dancook993 commented 1 month ago

Hello,

I have a three node vault cluster with raft storing running hashicorp/vault:1.8.0 on my EKS production cluster. In my production cluster, I have a vault agent injector running vault-k8s:0.11.0 which is succesfully mounting secrets into pods. The EKS version of this cluster is 1.22.

In my staging cluster, I then have a vault agent injector running vault-k8s:0.11.0. This connects to the production vault via its public ingress name. The EKS version of this cluster is 1.25. We have upgraded from 1.21 -> 1.25 and somewhere during this upgrade has broken vault agent injecting secrets into pods.

The logs I see in the stage vault agent injector are: 2024-07-15T18:20:15.104Z [INFO] handler: Starting handler.. Listening on ":8080"... 2024-07-15T18:20:15.188Z [INFO] handler.auto-tls: Generated CA 2024-07-15T18:20:15.188Z [INFO] handler.certwatcher: Updated certificate bundle received. Updating certs... 2024-07-15T18:20:36.532Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:20:40.768Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:21:06.087Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:21:06.926Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:21:10.379Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:21:35.591Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:22:07.043Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:22:39.532Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:23:05.544Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:23:07.980Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:24:43.173Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s 2024-07-15T18:25:00.118Z [INFO] handler: Request received: Method=POST URL=/mutate?timeout=30s

I have tried looking in the EKS api-server logs to look for any errors with the mutate requests but these seem to be passing as expected. Nothing has changed from either of our vault deployments other than updating the EKS version.

The mutating web hook configuration looks like this: webhooks:

The pod where we are trying to have the secret mounted has the following annotations: vault.hashicorp.com/agent-configmap: secrets-updater vault.hashicorp.com/agent-inject: true

These are the same annotations used in the production vault agent injector where it is working

Does anyone where where is best to log for further errors or information? I thought the kube-api server may be the best place but didn't see any mutate errors there. Without the vault agent giving any errors it is very difficult to troubleshoot. Setting the log level to debug also doesn't help.

dancook993 commented 1 month ago

Maybe this could be due to the service account changes from EKS 1.23 -> 1.24 as the agent injector seems to be able to mutate correctly.