hitachienergy / epiphany

Cloud and on-premises automation for Kubernetes centered industrial grade solutions.
Apache License 2.0
138 stars 107 forks source link

[BUG] Vault secrets cannot be injected into Kubernetes pods - AWS/RHEL/flannel|canal #1500

Closed przemyslavic closed 4 years ago

przemyslavic commented 4 years ago

Describe the bug Cannot inject Vault secrets into Kubernetes pods in the following configurations:

- AWS/RHEL/flannel

- AWS/RHEL/canal

To Reproduce Steps to reproduce the bug:

  1. Deploy a new cluster
  2. Login to Vault vault login
  3. Add a secret vault kv put secret/devwebapp/config username='test' password='test'
  4. Deploy test application
    apiVersion: apps/v1
    kind: Deployment
    metadata:
    name: devwebapp
    labels:
    app: devwebapp
    spec:
    replicas: 1
    selector:
    matchLabels:
      app: devwebapp
    template:
    metadata:
      labels:
        app: devwebapp
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "devweb-app"
        vault.hashicorp.com/agent-inject-secret-credentials.txt: "secret/data/devwebapp/config"
        vault.hashicorp.com/tls-skip-verify: "true"
    spec:
      serviceAccountName: internal-app
      containers:
      - name: app
        image: busybox
        command:
        - sleep
        - "3600"
        imagePullPolicy: IfNotPresent
  5. Check logs for container vault-agent-init: kubectl logs devwebapp-xxx-xxx -c vault-agent-init
  6. Check if the secret exists in the application container kubectl exec devwebapp-xxx-xxx -c app -- cat /vault/secrets/credentials.txt

Expected behavior The secrets have been injected properly into the pod and are accessible from within the pod.

Config files Configuration that should be included in the yaml file:

---
kind: configuration/vault
title: Vault Config
name: default
provider: aws
specification:
  vault_enabled: true

OS (please complete the following information):

Cloud Environment (please complete the following information):

Actual behavior: There is only one container named 'app'.

[ec2-user@ec2-xx-xx-xx-xx ~]$ kubectl get pods -A
NAMESPACE              NAME                                                                      READY   STATUS    RESTARTS   AGE
default                devwebapp-xx-xx                                                 1/1     Running   0          5s

Neither vault-agent-init nor vault-agent containers exist. There is no possibility to inject secrets.

[ec2-user@ec2-xx-xx-xx-xx ~]$ kubectl logs devwebapp-xx-xx -c vault-agent-init
error: container vault-agent-init is not valid for pod devwebapp-xx-xx
[ec2-user@ec2-xx-xx-xx-xx ~]$ kubectl logs devwebapp-xx-xx -c vault-agent
error: container vault-agent is not valid for pod devwebapp-xx-xx

Additional context Apiserver logs showing the issue:

I0723 13:46:14.698896       1 trace.go:116] Trace: "Call mutating webhook" configuration:vault-agent-injector-cfg,webhook:vault.hashicorp.com,resource:/v1, Resource=pods,subresource:,operation:CREATE,UID:xxx (started: 2020-07-23 13:45:44.698698084 +0000 UTC m=+5802.412268297) (total time: 30.000155842s):
Trace: [30.000155842s] [30.000155842s] END
W0723 13:46:14.698966       1 dispatcher.go:168] Failed calling webhook, failing open vault.hashicorp.com: failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded
E0723 13:46:14.698984       1 dispatcher.go:169] failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s: context deadline exceeded
I0723 13:46:14.702704       1 trace.go:116] Trace: "Create" url:/api/v1/namespaces/default/pods,user-agent:kube-controller-manager/v1.17.7 (linux/amd64) kubernetes/b445510/system:serviceaccount:kube-system:replicaset-controller,client:10.1.2.210 (started: 2020-07-23 13:45:44.693306426 +0000 UTC m=+5802.406876613) (total time: 30.00934833s):
Trace: [30.005736493s] [30.005653171s] About to store object in database

I also tested with tls disabled. Exactly the same two configurations AWS/RHEL/flannel and AWS/RHEL/canal do not work properly.

Originally posted by @przemyslavic in https://github.com/epiphany-platform/epiphany/issues/1398#issuecomment-663022164

atsikham commented 4 years ago

Can be reproduced in 0.7.0 with listed configurations, but not with develop branch.

develop results (HEAD is f4f2e5dc6f2926e54da38ca97cb7613cac9596af)

[root@ec2-54-162-110-36 ec2-user]# kubectl logs vault-agent-injector-7cf744b6fc-mxpq7 -n vault
2020-09-03T06:45:28.154Z [INFO]  handler: Starting handler..
Listening on ":8080"...
Updated certificate bundle received. Updating certs...
2020-09-03T06:57:45.206Z [INFO]  handler: Request received: Method=POST URL=/mutate?timeout=30s

image.png

0.7.0

image.png

verified with configurations:

---
kind: epiphany-cluster
title: Epiphany cluster Config
provider: aws
name: default
specification:
  name: vault-7<0|1>-<canal|flannel>
  prefix: atsikham
  admin_user:
    name: ec2-user
    key_path: /home/vscode/.ssh/id_rsa
  cloud:
    use_public_ips: true
    credentials:
      key: <replace>
      secret: <replace>
    region: us-east-1
  components:
    kubernetes_master:
      count: 1
      machine: kubernetes-master-machine-rhel
      subnets:
        - availability_zone: us-east-1a
          address_pool: 10.1.2.0/24
    kubernetes_node:
      count: 2
      machine: kubernetes-node-machine-rhel
      subnets:
        - availability_zone: us-east-1a
          address_pool: 10.1.2.0/24
    logging:
      count: 0
    monitoring:
      count: 0
    kafka:
      count: 0
    postgresql:
      count: 0
    load_balancer:
      count: 0
    rabbitmq:
      count: 0
version: <replace>
---
kind: configuration/vault
title: Vault Config
name: default
provider: aws
specification:
  vault_enabled: true
---
kind: infrastructure/virtual-machine
name: kubernetes-master-machine-rhel
provider: aws
based_on: kubernetes-master-machine
specification:
  os_full_name: RHEL-7.8_HVM_GA-20200225-x86_64-1-Hourly2-GP2
---
kind: infrastructure/virtual-machine
name: kubernetes-node-machine-rhel
provider: aws
based_on: kubernetes-node-machine
specification:
  os_full_name: RHEL-7.8_HVM_GA-20200225-x86_64-1-Hourly2-GP2
---
kind: configuration/kubernetes-master
name: default
provider: aws
specification:
  advanced:
    networking:
      plugin: <canal|flannel>
przemyslavic commented 4 years ago

I confirm, there is no issue anymore on the current develop version. The changes made to version 0.7.1 fixed the problem.