hashicorp / vault-helm

Helm chart to install Vault and other associated components.
Mozilla Public License 2.0
1.09k stars 880 forks source link

Mutation webhook failing to inject vault sidecars #163

Closed gopisaba closed 4 years ago

gopisaba commented 4 years ago

I am using the latest Vault Helm chart. The mutation webhook is failing to inject the vault-agent and consul-template sidecars.

Error messages on EKS api-server logs
E0106 11:41:35.118590 1 dispatcher.go:71] failed calling webhook "vault.hashicorp.com": Post https://vault-agent-injector-svc.infra-tools.svc:443/mutate?timeout=30s: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I don't see any other error messages on vault or vault-agent-injector pod. I am able to resolve and connect to the vault-agent-injector-svc from test pod in different namespace.

vault svc
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/instance: vault
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: vault-agent-injector
  name: vault-agent-injector-svc
  namespace: infra-tools
spec:
  clusterIP: 172.20.174.199
  ports:
  - port: 443
    protocol: TCP
    targetPort: 8080
  selector:
    app.kubernetes.io/instance: vault
    app.kubernetes.io/name: vault-agent-injector
    component: webhook
  sessionAffinity: None
  type: ClusterIP
jasonodonnell commented 4 years ago

Hi @gopisaba, I just deployed this on EKS in different namespaces, but could not reproduce what you're seeing.

Can you provide me with:

gopisaba commented 4 years ago
global:
  enabled: true
  tlsDisable: false
injector:
  certs:
    secretName: vault-tls

server:
  auditStorage:
    accessMode: ReadWriteOnce
    enabled: true
    size: 10Gi
    storageClass: null
  authDelegator:
    enabled: true
  dataStorage:
    enabled: false
  extraEnvironmentVars:
    VAULT_CACERT: /vault/userconfig/vault-tls/tls.ca
  extraVolumes:
  - name: vault-tls
    type: secret
  ha:
    config: |
      ui = true
      listener "tcp" {
        address = "[::]:8200"
        cluster_address = "[::]:8201"
        tls_cert_file = "/vault/userconfig/vault-tls/tls.crt"
        tls_key_file  = "/vault/userconfig/vault-tls/tls.key"
        tls_client_ca_file = "/vault/userconfig/vault-tls/tls.ca"
      }
      storage "dynamodb" {
        ha_enabled = "true"
        region     = "eu-west-1"
        table      = "vault-backend"
      }
      seal "awskms" {
        region     = "eu-west-1"
        kms_key_id = "1ee6b01a-1d8a-4cfb-abcd-12bdc43ab8d2"
        endpoint   = "https://vpce-01234567890-6abcdef.kms.eu-west-1.vpce.amazonaws.com"
      }
    enabled: true
    replicas: 3
  ingress:
    enabled: false
  nodeSelector: |
    nodeType: grp1
  standalone:
    enabled: false
ui:
  enabled: true
  serviceNodePort: 32582
  serviceType: NodePort

Kube Version = 1.14 (EKS)

✔ k describe svc vault-agent-injector-svc -n infra-tools
Name:              vault-agent-injector-svc
Namespace:         infra-tools
Labels:            app.kubernetes.io/instance=vault
                   app.kubernetes.io/managed-by=Tiller
                   app.kubernetes.io/name=vault-agent-injector
Annotations:       flux.weave.works/antecedent: infra-tools:helmrelease/vault
Selector:          app.kubernetes.io/instance=vault,app.kubernetes.io/name=vault-agent-injector,component=webhook
Type:              ClusterIP
IP:                172.20.174.199
Port:              <unset>  443/TCP
TargetPort:        8080/TCP
Endpoints:         100.64.3.76:8080
Session Affinity:  None
Events:            <none>
krep-dr commented 4 years ago

@gopisaba could be the same problem I had https://github.com/hashicorp/vault-k8s/issues/46

gopisaba commented 4 years ago

@krep-dr - That's it. After opening the port 8080 between EKS cluster and worker nodes, the mutation webhook started working. Thanks for pointing me to the right direction

DongshengXiong-old commented 4 years ago

@gopisaba what is the EKS cluster IP range? Or how can I find out the range? I do have the same issue. Thanks!

gopisaba commented 4 years ago

@DongshengXiong - Allowing EKS cluster security group to EKS worker nodes security group over the port 8080 fixed the issue for me.

DongshengXiong-old commented 4 years ago

@gopisaba thanks for your reply. Actually, I am using Weave Net CNI. My issue is fixed by this solution(https://github.com/hashicorp/vault-k8s/issues/72)

pksurferdad commented 4 years ago

Hi @DongshengXiong what specifically did you change on the EKS security group? Did you use eksctl to set up your cluster? If so, which security group did you change and which security group was the source for the inbound rule?

dvyas1 commented 1 year ago

Hi @DongshengXiong what specifically did you change on the EKS security group? Did you use eksctl to set up your cluster? If so, which security group did you change and which security group was the source for the inbound rule?

I know its been a while since this was asked and you probably know the answer by now, but for anyone else, there are two security groups you will need to change, one for inbound and one for outbound.

1) There should be a security group named something like "-cluster". It has just 1 inbound rule on the port 443. Add an outbound rule to this group on port 8080/TCP, destination should be a security group that is attached to all nodes. There should already be 2 other outbound rules (port 443 & 10250), you can use same destination group id as these. 2) Add an inbound rule to the destination group from above, port 8080/TCP, source: above group id (Cluster API server group).

Read comments on inbound and outbound security rules to figure out which group is used for what.

kschoche commented 1 year ago

I ran into this issue the other day when using terraform to deploy the terraform-aws-modules/eks/aws "eks module" and wanted to share my fixes, in hopes that the next person doing this will find this helpful.

When defining the EKS module, you need to add the following node_security_group_additional_rules:

node_security_group_additional_rules = {
    ingress_vault_injector_webhook = {
      description                   = "Access to Vault Agent Injector webhook endpoint from API server"
      protocol                      = "tcp"
      from_port                     = 8080
      to_port                       = 8080
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }
younsl commented 5 months ago

This solution works well in the EKS cluster. Thanks to @kschoche!


Problem

E0610 20:50:30.214031      10 dispatcher.go:214] failed calling webhook "vault.hashicorp.com": failed to call webhook: Post "[https://vault-agent-injector-svc.vault.svc:443/mutate?timeout=30s](https://vault-agent-injector-svc.vault.svc/mutate?timeout=30s)": context deadline exceeded

Environment

Solution

vault-agent-injector pod responds to MutatingWebhook calls through the MutatingWebhookConfiguration named vault-agent-injector-cfg and typically uses port tcp/8080.

MutatingWebhook

In official vault helm chart, the values related to vault-agent-injector pod are as follows:

# vault-helm/values.yaml
injector:
  # True if you want to enable vault agent injection.
  # @default: global.enabled
  enabled: "-"

  replicas: 1

  # Configures the port the injector should listen on
  port: 8080

So add an inbound rule to the worker node security group (SG) to allow TCP 8080 with the Control Plane as the source.

---
title: Kubernetes architecture (EKS v1.30)
---
flowchart LR
  subgraph Control plane
    C["kube-apiserver"]
  end
    S["vault-agent-injector-svc"]
  subgraph Worker node
    P["vault-agent-injector"]
  end
  C --"tcp/443"--> S:::blue -. tcp/8080 .-> P
  classDef blue stroke:#00f

Example in terraform using eks module

Add an inbound rule for tcp port 8080 to node_security_group_additional_rules value provided by the EKS module.

module "eks" {
  # ... truncated ...
  node_security_group_additional_rules = {
    ingress_vault_agent_injector_mutating_webhook = {
      description                   = "Allow ingress mutating webhook traffic from kube-apiserver to vault-agent-injector pod"
      protocol                      = "tcp"
      from_port                     = 8080
      to_port                       = 8080
      type                          = "ingress"
      source_cluster_security_group = true
    }
    # Similar case for linkerd-viz tap pod's api service
    ingress_linkerd_viz_tap_api = {
      description                   = "Allow ingress api calling traffic from kube-apiserver to linkerd-viz tap pod"
      protocol                      = "tcp"
      from_port                     = 8088
      to_port                       = 8089
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }
  # ... truncated ...
}

Reference

Similar case Linkerd-Viz Tap FailedDiscoveryCheck while Running on EKS