hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.04k stars 4.12k forks source link

Agent injection : Forbidden: pod updates may not add or remove containers #11064

Open pbriet opened 3 years ago

pbriet commented 3 years ago

Describe the bug At one point, the K8S injection stops working correctly. Pods are stuck in "ContainerCreating" state, with the following error : failed to set annotation on pod xxx: Pod "xxx" is invalid: spec.containers: Forbidden: pod updates may not add or remove containers

It fails on any Deployment (see example below). But curiously, it does not when creating pods directly.

Tried to restart the injection agent, but no change.

To Reproduce Not sure. It happenened twice, in two separate OKD clusters. 1- Playing some time with Vault with success 2- Disabling Vault injection on the given deployment (removing annotations) 3- Re-enabling Vault injection (adding annotations) 4- Failure (even when creating other deployments)

Once in failure state, it is always reproducible.

Expected behavior Should continue to inject my secrets

Environment:

Example of failing deployment

Pretty simple one :

kind: Deployment
apiVersion: apps/v1
metadata:
  name: failure
  namespace: app-experiments
spec:
  replicas: 1
  selector:
    matchLabels:
      app: failure
  template:
    metadata:
      labels:
        app: failure
      annotations:
        vault.hashicorp.com/agent-inject: 'true'
        vault.hashicorp.com/role: default-experiment
        vault.hashicorp.com/agent-inject-secret-s3-credentials: secret/experiments/file-manager

    spec:
      containers:
        - resources:
          name: test
          image: public.ecr.aws/bitnami/nginx:latest

But the following Pod spec works correctly :

kind: Pod
apiVersion: v1
metadata:
  annotations:
    vault.hashicorp.com/agent-inject: 'true'
    vault.hashicorp.com/agent-inject-secret-s3-credentials: secret/experiments/file-manager
    vault.hashicorp.com/role: default-experiment
  name: success-single-pod
  namespace: app-experiments
spec:
  containers:
    - name: test
      image: 'public.ecr.aws/bitnami/nginx:latest'

Vault Agent is correctly injected, and secrets mounted.

Logs from the vault-agent-injector pod look healthy in any case :

2021-03-09T14:33:57.032Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=10s
2021-03-09T14:33:57.038Z [DEBUG] handler: checking if should inject agent..
2021-03-09T14:33:57.038Z [DEBUG] handler: checking namespaces..
2021-03-09T14:33:57.038Z [DEBUG] handler: setting default annotations..
2021-03-09T14:33:57.038Z [DEBUG] handler: creating new agent..
2021-03-09T14:33:57.038Z [DEBUG] handler: validating agent configuration..
2021-03-09T14:33:57.038Z [DEBUG] handler: creating patches for the pod..

It looks like some kind of mutating webhook issue. Might be related to OKD - not sure where to look at

pbriet commented 3 years ago

It also fails when creating the pod directly... through ArgoCD. Basically, it seems to fail when the pod is not created by the user itself

HridoyRoy commented 3 years ago

Hi folks, I'm not sure I understand the context here. Are there logs that point to issues with vault, versus infrastructure/setup issues? Thanks!

pbriet commented 3 years ago

Hi,

Well, there isn't any certainty. The only logs are the pod events. The pod fails to be created when Vault tries to insert the sidecar.

I guess the vault agent is applying a MutatingWebHook on the pod. The bug might come from the way the pod is mutated.

If I'm the only one affected ATM, this is surely not a priority, but it's worth keeping an eye on it.

HridoyRoy commented 3 years ago

Ok, sounds good @pbriet . I'll keep the issue open so we can dig into it if more information comes up. And, if others run into a similar issue, they can comment with more details as well. Thanks!

pjastrzabek commented 2 years ago

I have a very similar case. I am trying to inject secret into StatefullSet, the last log from vault injector is [DEBUG] handler: creating new agent..

but sidecar is never injected. Instead I see event in StatefullSet saying

create Pod thanos-store-0 in StatefulSet thanos-store failed error: Internal error occurred: failed calling webhook "vault.hashicorp.com": expected response.uid="4b5de1f6-ce6f-4b75-9fc0-b91e5ba66734", got ""