Kubernetes Service Discovery on k8s 1.10

lukpre commented 4 years ago

Hey

We have a customer running a Kubernetes Cluster with Version 1.10.11 and we tried to install Vault 1.4.0 via the your vault-helm Chart (0.5.0). Unfortunatly it's not possible to get it working, because the pod's never get the labels that are documented here: https://www.vaultproject.io/docs/configuration/service-registration/kubernetes

It's not an RBAC Issue, because we tested manually setting labels on pods with the serviceAccount that run's vault. There were no problems doing this. We even temporarily gave the serviceAccount admin rights in the Namespace which also didn't change a thing.

Unfortunatly there are not to many log informations concerning why it didn't work (see below). The only problem is that the labels for the service registration are never patched to the pods and therefore the kubernetes service discovery doesn't work which makes it impossible to initialize and use vault with k8s 1.10.

We have tested the Chart 0.5.0 with Vault 1.4.0 on a microk8s 1.11 instance and it worked right away.

So i'm wondering: What might cause the problem that this doesn't work on kubernetes 1.10 ? Only guess so far is that the kubernetes go client libraries might handle some things differently and are not compatible with 1.10 anymore.

I would greatly appreciate any hints, ideas or an explanation why it doesn't work.

Environment:

Vault Version: 1.4.0
Operating System/Architecture: Kubernetes 1.10.11

Startup Log Output:

[DEBUG] service_registration.kubernetes: "namespace": "vault-testing"                                                                                           
[DEBUG] service_registration.kubernetes: "pod_name": "vault-2"                                                                                                  
[WARN]  service_registration.kubernetes: unable to set initial state due to PATCH https://172.24.0.1:443/api/v1/namespaces/vault-testing/pods/vault-2 giving up 
after 11 attempts, will retry

**Log's when trying to initialise vault

Error initializing: context deadline exceeded

Expected Behavior: Pods should get patched with the labels required for kubernets service discovery.

Actual Behavior: No labels are set and therefore the service discovery mechanism isn't working.

Steps to Reproduce: Use a K8s Cluster or Microk8s Version 1.10 install vault-helm chart in HA Mode with Raft enabled. vault operator init & service discovery will not work. Verify labels that were not applied kubectl get pods vault-0 -o jsonpath='{.metadata.label}'

pcman312 commented 4 years ago

Moving this to https://github.com/hashicorp/vault-helm/

lukpre commented 4 years ago

I'm not sure if this is related to the Chart itself, because there are no problems deploying it. All the resources get created correctly. It's only that when everything is deployed vault can't start in ha mode, because the service registration mechanism doesn't seem to be able to patch the pods. I would have guessed that this has something to do with the PR in the vault repo #8249. I guess the log entry i posted get's created here.

So either the requirements have changed and Kubernetes 1.11 is now a prerequisite. Or there is something in the implementation of the Kubernetes Service Registration (see PR from above) that is not compatible with 1.10 anymore.

Excited to hear any news.

Thank you in advance

pcman312 commented 4 years ago

Oh my apologies. I misunderstood where the problem was. I'll transfer this back to the Vault repo.

tyrannosaurus-becks commented 4 years ago

Hi! Thanks for reporting this!

Can you give additional steps to reproduce? For instance, many folks edit the default values.yaml or provide values to override the default ones like shown here. That will help us understand the config that's in use.

Thanks again!

bitfehler commented 4 years ago

I think this is a documentation bug. The documentation states to create a role allowing verbs: ["get", "update"], but vault is actually trying to use the PATCH method. If you change your role to allow verbs: ["get", "patch", "update"] it should work, at least it did for me.

lukpre commented 4 years ago

@tyrannosaurus-becks : Thanks a lot for your feedback.

The only things that we have changed in values.yaml are the following:

server.standalone.enabled=false
server.ha.enabled=true
server.ha.raft.enabled=true

No additionall configurations so far. Default 0.5.0 helm chart with the above changes. Basically just vault with raft in ha mode.

@bitfehler: Thanks for the Input, but the Role for the service discovery has those verbs added (https://github.com/hashicorp/vault-helm/blob/master/templates/server-discovery-role.yaml#L17) and it gets created correctly.

vvanghelle commented 4 years ago

Hi there,

Getting the same error here : "service_registration.kubernetes: unable to set initial state due to PATCH https://172.20.0.1:443/api/v1/namespaces/vault/pods/vault-0 giving up after 11 attempts"

Running helm chart 0.5.0 with HA enabled (standalone is working fine..) on EKS (kubernetes 1.15 with Automatic scaling groups), role on nodes have the correct policies ( https://www.vaultproject.io/docs/platform/k8s/helm/examples/enterprise-best-practice#walk-through )

Chart in HA mode is working fine on minikube in local.

vvanghelle commented 4 years ago

Hi @lukpre

After some chats with my network team, it seems network can be complicated on AWS and proxies, so we ended up adding the "hostNetwork: true" on the statefulset for vault, and label modification seems now to be working, and I can then unseal my pods..

lukpre commented 4 years ago

@vvanghelle : Thank you for your feedback. We don't run k8s on AWS and also the Version is 1.10 not 1.15 so i'm not sure if that's the problem. Also as i mentioned it works perfectly fine on k8s 1.11. But i will check if that changes something

jbielick commented 4 years ago

Just installed the chart 0.6.0 with image vault:1.4.2 on a GKE cluster 1.14 and its stalled forever on service_registration.kubernetes: unable to set initial state due to PATCH https://10.2.0.1:443/api/v1/namespaces/vault/pods/vault-0 giving up after 11 attempts, will retry.

All operator commands time out. I assume this is due to the missing labels.

Values:

server:
  extraArgs: "-log-level=debug"
  standalone:
    enabled: false
  ha:
    enabled: true
    raft:
      enabled: true

Service account token inside the pods appears to work for GET /:namespace/pods/vault-0, haven't tried PATCH yet. DEBUG log level doesn't seem to print anything additional from the service_registration logger.

I tried deleting all pods so they get recreated in case there was a race condition and the default service account token was being used instead of the server-discovery-role bound token. This had no effect.

Kube server info:

Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.36", GitCommit:"34a615f32e9a0c9e97cdb9f749adb392758349a6", GitTreeState:"clean", BuildDate:"2020-04-06T16:33:17Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}

Nodes are v1.13.11-gke.14.

jbielick commented 4 years ago

I believe I've discovered the issue here, if only for GKE clusters.

Debugging

It wasn't until increasing some logging that I was able to see why the PATCH request was not failing. The easiest way to do this was to provide the client.logger (hclog.Logger) to the retryablehttp.Client client as seen in this diff:

diff --git a/serviceregistration/kubernetes/client/client.go b/serviceregistration/kubernetes/client/client.go
index 934d3bad9..44c3ca084 100644
--- a/serviceregistration/kubernetes/client/client.go
+++ b/serviceregistration/kubernetes/client/client.go
@@ -153,6 +153,7 @@ func (c *Client) do(req *http.Request, ptrToReturnObj interface{}) error {
                RetryWaitMin: RetryWaitMin,
                RetryWaitMax: RetryWaitMax,
                RetryMax:     RetryMax,
+               Logger:       c.logger,
                CheckRetry:   c.getCheckRetry(req),
                Backoff:      retryablehttp.DefaultBackoff,
        }

Instead of something a little cryptic, like

[WARN]  service_registration.kubernetes: unable to set initial state due to PATCH https://a.b.c.d:443/api/v1/namespaces/vault/pods/vault-0 giving up after 11 attempts, will retry

I was able to see

[DEBUG] service_registration.kubernetes: retrying request: request="PATCH https://10.2.0.1:443/
api/v1/namespaces/vault/pods/vault-0 (status: 504)" timeout=30s remaining=4

when vault's log level was set to debug.

Would a PR be accepted for this change?

Root Cause

It might appear that the client is timing out while connecting to the Kubernetes API, but it is the API that is returning a 504 with a JSON error descriptor response of

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Timeout: request did not complete within requested timeout 30s",
  "reason": "Timeout",
  "details": {

  },
  "code": 504
}

It turns out that any PATCH to a pod in this namespace would produce the same error, suggesting some admission controller or otherwise was doing some additional processing and timed out while doing so.

That led me to find these issues which had similar symptoms:

https://github.com/elastic/cloud-on-k8s/issues/1673 https://github.com/helm/charts/issues/16174 https://github.com/helm/charts/issues/16249#issuecomment-520795222

Common variables:

GKE Private Cluster
A MutatingWebhookConfiguration

The vault-helm chart does indeed install a MutatingWebhookConfiguration as seen here, requesting a webhook for any UPDATE or CREATE to any pods in the namespace.

The kubernetes API server would be the one making the webhook request to a node (service within) and if a firewall rules does not allow traffic from the master to the node on tcp:8080, the master is unable to reach the service and times out. The PATCH request thus times out as a cascaded failure.

The solution for GKE Private Clusters is to add a firewall rule allowing this traffic (master CIDR -> nodes [by network tag] on tcp:8080) as described here.

I suspect a similar issue exists in other environments. Before making a firewall rule, you could verify that this issue is affecting you by disabling the agent-injector via injector.enabled=false in values or by deleting the MutatingWebhookConfiguration named vault-agent-injector-cfg.

mariusgiger commented 4 years ago

The solution for GKE Private Clusters is to add a firewall rule allowing this traffic (master CIDR -> nodes [by network tag] on tcp:9443) as described here.

I also had to add port 8080 to the firewall rule for the service_registration to work. Any ideas why this is necessary?

jbielick commented 4 years ago

@mariusgiger Thanks for pointing that out. I think my original statement was incorrect, as the MutatingWebhook in question here is for the agent-injector-svc (Vault Agent Injector Service) whose target port is 8080.

https://github.com/hashicorp/vault-helm/blob/7a8180862e488770262aa94ba81a65999fa8bda9/templates/injector-service.yaml#L14

So 8080 is the correct port to allow in the firewall rules. I'll update my comment. 9443 is used by some other helm chart services that accept mutating webhooks, so it may be useful to allow that as well.

ib-ak commented 3 years ago

in my case it was PodSecurityPolicy. I am on AWS k8s version=1.19 with custom psp.

right now it works with a privileged policy (eks.privileged if you are on AWS)

my suggestion to debug the exact root cause would be to check api server logs. Below is my api audit message from aws cloudwatch

{
    "kind": "Event",
    "apiVersion": "audit.k8s.io/v1",
    "level": "RequestResponse",
    "auditID": "xxx",
    "stage": "ResponseComplete",
    "requestURI": "/api/v1/namespaces/vault/pods/vault-0",
    "verb": "patch",
    "user": {
        "username": "system:serviceaccount:vault:vault-seal-controller",
        "uid": "xxx",
        "groups": [
            "system:serviceaccounts",
            "system:serviceaccounts:vault",
            "system:authenticated"
        ]
    },
    "sourceIPs": [
        "xxxx"
    ],
    "userAgent": "Go-http-client/1.1",
    "objectRef": {
        "resource": "pods",
        "namespace": "vault",
        "name": "vault-0",
        "apiVersion": "v1"
    },
    "responseStatus": {
        "metadata": {},
        "status": "Failure",
        "reason": "Forbidden",
        "code": 403
    },
    "requestObject": [
        {
            "op": "replace",
            "path": "/metadata/labels/vault-version",
            "value": "1.6.2"
        },
        {
            "op": "replace",
            "path": "/metadata/labels/vault-active",
            "value": "false"
        },
        {
            "op": "replace",
            "path": "/metadata/labels/vault-sealed",
            "value": "true"
        },
        {
            "op": "replace",
            "path": "/metadata/labels/vault-perf-standby",
            "value": "false"
        },
        {
            "op": "replace",
            "path": "/metadata/labels/vault-initialized",
            "value": "false"
        }
    ],
    "responseObject": {
        "kind": "Status",
        "apiVersion": "v1",
        "metadata": {},
        "status": "Failure",
        "message": "pods \"vault-0\" is forbidden: PodSecurityPolicy: unable to validate pod: []",
        "reason": "Forbidden",
        "details": {
            "name": "vault-0",
            "kind": "pods"
        },
        "code": 403
    },
    "requestReceivedTimestamp": "2021-03-10T12:25:34.557196Z",
    "stageTimestamp": "2021-03-10T12:25:34.574131Z",
    "annotations": {
        "authentication.k8s.io/legacy-token": "system:serviceaccount:vault:vault-seal-controller",
        "authorization.k8s.io/decision": "allow",
        "authorization.k8s.io/reason": "RBAC: allowed by RoleBinding \"vault-discovery-rolebinding/vault\" of Role \"vault-discovery-role\" to ServiceAccount \"vault-seal-controller/vault\""
    }
}

Still trying to figure out whats the least privilege policy for the pod. any input is appreciated

dniasoff commented 2 years ago

I just had the same issue with vault helm chart version 0.19 and have spent hours troubleshooting.

I am using Kubernetes version 1.22.6 on Fedora CoreOS with the following admission controls enabled - CertificateApproval,CertificateSigning,CertificateSubjectRestriction,DefaultIngressClass,DefaultStorageClass,DefaultTolerationSeconds,LimitRanger,MutatingAdmissionWebhook,NamespaceLifecycle,PersistentVolumeClaimResize,PodSecurity,PodSecurityPolicy,Priority,ResourceQuota,RuntimeClass,ServiceAccount,StorageObjectInUseProtection,TaintNodesByCondition,ValidatingAdmissionWebhook

Turns out I had to make a number of changes to the existing PodSecurityPolicy and then it started working.

Here is my updated PodSecurityPolicy.

kind: PodSecurityPolicy
metadata:
  name: vault
  namespace: vault
spec:
  allowPrivilegeEscalation: true
  fsGroup:
    rule: RunAsAny
  hostIPC: false
  hostNetwork: false
  hostPID: false
  privileged: true
  readOnlyRootFilesystem: false
  runAsUser:
    rule: RunAsAny
  seLinux:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  volumes:
    - configMap
    - emptyDir
    - projected
    - secret
    - downwardAPI
    - persistentVolumeClaim

Note that I had to remove the following annotations

  annotations:
    apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
    apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default,runtime/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: runtime/default

hixichen commented 2 years ago

for anyone who had read to here, and using GKE private cluster.

Make sure you added role binding:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: default  # make sure to use `default` which can be auto replaced by kustomize
  name: hashicorp-vault-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: hashicorp-vault-role-binding
  namespace: default  # make sure to use `default` which can be auto replaced by kustomize
  labels:
    app: hashicorp-vault
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: hashicorp-vault-role
subjects:
- kind: ServiceAccount
  name: hashicorp-vault
  namespace: default  # make sure to use `default` which can be auto replaced by kustomize

I actually test firewall rule and master authorized networks, without additional work, it works on my GKE private cluster.

disabled firewall rule:

master authorized networks with internal vm only.

Note: my pod-cidr (10.1.0.0/16)

result:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-04-13T18:00:08Z"
  labels:
    app: hashicorp-vault
    component: server
    controller-revision-hash: hashicorp-vault-5b446c757b
    environment: dev
    statefulset.kubernetes.io/pod-name: hashicorp-vault-2
    vault-active: "false"
    vault-initialized: "true"
    vault-perf-standby: "false"
    vault-sealed: "false"
    vault-version: 1.10.0
  name: hashicorp-vault-2
  namespace: dev
  hostIP: 10.138.0.8
  podIP: 10.1.2.24
  podIPs:
  - ip: 10.1.2.24
  startTime: "2022-04-13T18:00:08Z"

vault versionl: 1.10.0, open source

hashicorp / vault