hashicorp / vault-helm

Helm chart to install Vault and other associated components.
Mozilla Public License 2.0
1.08k stars 874 forks source link

Consul Client Not Running on HA Mode of Vault #280

Open rdeb22 opened 4 years ago

rdeb22 commented 4 years ago

Vault should have consul client running, But, consul is not running,

kubectl exec vault-helm-0 -it sh
/ $ ps -ef | grep consul
28322 vault     0:00 grep consul

I have deployed Vault on k8s in HA and want to use consul as storage. The pods are not running.

vault-helm-0                           0/1     Running   0          162m
vault-helm-1                           0/1     Running   0          162m
vault-helm-2                           0/1     Running   0          162m.```

Logs

WARNING! Unable to read storage migration status.
2020-04-23T18:27:15.699Z [INFO]  proxy environment: http_proxy= https_proxy= no_proxy=
2020-04-23T18:27:15.701Z [WARN]  storage migration check error: error="Get http://127.0.0.1:8500/v1/kv/vault/core/migration: dial tcp 127.0.0.1:8500: connect: connection refused"

Describing the Pod

Name:               vault-helm-0
Namespace:          spr-xxx
Priority:           0
PriorityClassName:  <none>
Node:               qa4-apps-k8s-node-202003241110-10-1a/10.xx.xxx.xxx
Start Time:         Thu, 23 Apr 2020 18:27:13 +0000
Labels:             app.kubernetes.io/instance=vault-helm
                    app.kubernetes.io/name=vault
                    component=server
                    controller-revision-hash=vault-helm-764cc498f5
                    helm.sh/chart=vault-0.5.0
                    statefulset.kubernetes.io/pod-name=vault-helm-0
Annotations:        cni.projectcalico.org/podIP: 192.168.43.48/32
                    kubernetes.io/limit-ranger: LimitRanger plugin set: cpu, memory request for container vault; cpu, memory limit for container vault
Status:             Running
IP:                 192.168.xx.xx
Controlled By:      StatefulSet/vault-helm
Containers:
  vault:
    Container ID:  docker://a0e8c5b0ac6c181ea0b4a8871edf4a41967780520e3ff2be1c3d7b183518fe60
    Image:         vault:1.3.2
    Image ID:      docker-pullable://vault@sha256:cf9d54f9a5ead66076066e208dbdca2094531036d4b053c596341cefb17ebf95
    Ports:         8200/TCP, 8201/TCP, 8202/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Command:
      /bin/sh
      -ec
    Args:
      sed -E "s/HOST_IP/${HOST_IP?}/g" /vault/config/extraconfig-from-values.hcl > /tmp/storageconfig.hcl;
      sed -Ei "s/POD_IP/${POD_IP?}/g" /tmp/storageconfig.hcl;
      /usr/local/bin/docker-entrypoint.sh vault server -config=/tmp/storageconfig.hcl

    State:          Running
      Started:      Thu, 23 Apr 2020 18:27:15 +0000
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      500m
      memory:   256Mi
    Readiness:  exec [/bin/sh -ec vault status -tls-skip-verify] delay=5s timeout=5s period=3s #success=1 #failure=2
    Environment:
      HOST_IP:               (v1:status.hostIP)
      POD_IP:                (v1:status.podIP)
      VAULT_K8S_POD_NAME:   vault-helm-0 (v1:metadata.name)
      VAULT_K8S_NAMESPACE:  spr-xxx (v1:metadata.namespace)
      VAULT_ADDR:           https://127.0.0.1:8200
      VAULT_API_ADDR:       https://$(POD_IP):8200
      SKIP_CHOWN:           true
      SKIP_SETCAP:          true
      HOSTNAME:             vault-helm-0 (v1:metadata.name)
      VAULT_CLUSTER_ADDR:   https://$(HOSTNAME).vault-helm-internal:8201
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from vault-helm-token-ptt4p (ro)
      /vault/config from config (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      vault-helm-config
    Optional:  false
  vault-helm-token-ptt4p:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  vault-helm-token-ptt4p
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                 From                                           Message
  ----     ------     ----                ----                                           -------
  Normal   Scheduled  89s                 default-scheduler                              Successfully assigned spr-ops/vault-helm-0 to qa4-apps-k8s-node-202003241110-10-1a
  Normal   Pulled     88s                 kubelet, k8s-node-202003241110-10-1a  Container image "vault:1.3.2" already present on machine
  Normal   Created    87s                 kubelet, k8s-node-202003241110-10-1a  Created container
  Normal   Started    87s                 kubelet, k8s-node-202003241110-10-1a  Started container
  Warning  Unhealthy  18s (x22 over 81s)  kubelet, k8s-node-202003241110-10-1a  Readiness probe failed: Error checking seal status: Get https://127.0.0.1:8200/v1/sys/seal-status: dial tcp 127.0.0.1:8200: connect: connection refused

Also, I am receiving error even while passing the setting the variable env why?

kubectl exec vault-helm-0 -it sh
/ $ export VAULT_ADDR=http://127.0.0.1:8200
/ $ vault -v
Vault v1.3.2
/ $ vault operator init -n 1 -t 1
Error initializing: Put http://127.0.0.1:8200/v1/sys/init: dial tcp 127.0.0.1:8200: connect: connection refused
jasonodonnell commented 4 years ago

Hi @rdeb22,

We don't include Consul Agent with the Vault deployment. Instead, consul-helm deploys Consul Agent as a daemonset so it runs on every worker node. Vault can access the Consul Agent running on the node using localhost: 127.0.0.1.

rdeb22 commented 4 years ago

@jasonodonnell Thanks for the information,

I did what you asked me to do, Installed helm-consul in client mode. But still seeing these errors in connection.

      ui = true

      listener "tcp" {
        tls_disable = 1
        address = "127.0.0.1:8200"
      }
      storage "consul" {
        path = "vault/"
        address = "127.0.0.1"
      }

Getting Connection Refused

[root@qa4-ops2-k8s-master-20191021071333-1-1b 10.11.92.39:~] k logs vault-helm-0

WARNING! Unable to read storage migration status.
2020-04-29T19:42:25.923Z [INFO]  proxy environment: http_proxy= https_proxy= no_proxy=
2020-04-29T19:42:25.923Z [WARN]  storage migration check error: error="Get http://127.0.0.1/v1/kv/vault/core/migration: dial tcp 127.0.0.1:80: connect: connection refused"
jasonodonnell commented 4 years ago

Hi @rdeb22,

Your config needs a little tweaking (I mislead you in my last comment, needs to specify consul's port):

      storage "consul" {
        path = "vault/"
        address = "127.0.0.1:8500"
      }
jasonodonnell commented 4 years ago

Actually, this might need to be the host's IP address. Vault Helm will do this for you automatically if you configure it like this:

      storage "consul" {
        path = "vault/"
        address = "HOST_IP:8500"
      }
rdeb22 commented 4 years ago

@jasonodonnell Thank you again and no problem but sadly this also did not work, see below. Made the address = "HOST_IP:8500" abd deployed. Now ,

k get pods | grep vaul
vault-helm-0                                                     0/1     Running            0          87s
vault-helm-1                                                     0/1     Running            0          87s
vault-helm-2                                                     0/1     Running            0          87s

Logs now shows "Unexpected response code: 500"

k logs vault-helm-0
2020-04-30T11:02:27.887Z [INFO]  proxy environment: http_proxy= https_proxy= no_proxy=
2020-04-30T11:02:27.888Z [WARN]  storage migration check error: error="Unexpected response code: 500"

WARNING! Unable to read storage migration status.
2020-04-30T11:02:29.889Z [WARN]  storage migration check error: error="Unexpected response code: 500"

On Describing Pod I see

Events:
  Type     Reason     Age                   From                                                  Message
  ----     ------     ----                  ----                                                  -------
  Normal   Scheduled  2m29s                 default-scheduler                                     Successfully assigned spr-ops/vault-helm-0 to k8s-openebs-node-202003110542-2-1b
  Normal   Pulled     2m28s                 kubelet, k8s-openebs-node-202003110542-2-1b  Container image "vault:1.3.2" already present on machine
  Normal   Created    2m28s                 kubelet, k8s-openebs-node-202003110542-2-1b  Created container
  Normal   Started    2m28s                 kubelet, k8s-openebs-node-202003110542-2-1b  Started container
  Warning  Unhealthy  78s (x22 over 2m21s)  kubelet, k8s-openebs-node-202003110542-2-1b  Readiness probe failed: Error checking seal status: Get http://127.0.0.1:8200/v1/sys/seal-status: dial tcp 127.0.0.1:8200: connect: connection refused

But,

k exec -it vault-helm-0 sh
/ $ export VAULT_ADDR=http://127.0.0.1:8200
/ $ vault status
Error checking seal status: Get http://127.0.0.1:8200/v1/sys/seal-status: dial tcp 127.0.0.1:8200: connect: connection refused
/ $ ps -ef | grep consul
 5790 vault     0:00 grep consul
/ $ ps -ef | grep vault
    1 vault     0:00 /bin/sh -ec sed -E "s/HOST_IP/${HOST_IP?}/g" /vault/config/extraconfig-from-values.hcl > /tmp/storageconfig.hcl; sed -Ei "s/POD_IP/${POD_IP?}/g" /tmp/storageconfig.hcl; /usr/local/bin/docker-entrypoint.sh vault server -config=/tmp/storageconfig.hcl
    9 vault     0:00 {docker-entrypoi} /usr/bin/dumb-init /bin/sh /usr/local/bin/docker-entrypoint.sh vault server -config=/tmp/storageconfig.hcl
   10 vault     0:00 vault server -config=/tmp/storageconfig.hcl
 4158 vault     0:00 sh
 5818 vault     0:00 ps -ef
 5819 vault     0:00 sh
/ $ cat /tmp/storageconfig.hcl
disable_mlock = true
ui = true

listener "tcp" {
  tls_disable = 1
  address = "127.0.0.1:8200"
}
storage "consul" {
  path = "vault/"
  address = "10.11.04.18:8500"
}

#service_registration "kubernetes" {}

# Example configuration for using auto-unseal, using Google Cloud KMS. The
# GKMS keys must already exist, and the cluster must have a service account
# that is authorized to access GCP KMS.
#seal "gcpckms" {
#   project     = "vault-helm-dev-246514"
#   region      = "global"
#   key_ring    = "vault-helm-unseal-kr"
#   crypto_key  = "vault-helm-unseal-key"
#}/ $

Please help. Thanks

jasonodonnell commented 4 years ago

Hi @rdeb22, can you make the following changes to your config?

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }
rdeb22 commented 4 years ago

@jasonodonnell Sure, I did but same thing.

/ $ export VAULT_ADDR=http://127.0.0.1:8200
/ $ vault status
Error checking seal status: Get http://127.0.0.1:8200/v1/sys/seal-status: dial tcp 127.0.0.1:8200: connect: connection refused
/ $ cat /tmp/storageconfig.hcl
disable_mlock = true
ui = true

listener "tcp" {
  tls_disable = 1
  address = "[::]:8200"
  cluster_address = "[::]:8201"
}
storage "consul" {
  path = "vault/"
  address = "10.11.12.42:8500"
}

#service_registration "kubernetes" {}

# Example configuration for using auto-unseal, using Google Cloud KMS. The
# GKMS keys must already exist, and the cluster must have a service account
# that is authorized to access GCP KMS.
#seal "gcpckms" {
#   project     = "vault-helm-dev-246514"
#   region      = "global"
#   key_ring    = "vault-helm-unseal-kr"
#   crypto_key  = "vault-helm-unseal-key"

Logs

k logs vault-helm-0

WARNING! Unable to read storage migration status.
2020-04-30T14:10:24.578Z [INFO]  proxy environment: http_proxy= https_proxy= no_proxy=
2020-04-30T14:10:24.580Z [WARN]  storage migration check error: error="Unexpected response code: 500"

Pods

Events:
  Type     Reason     Age                    From                                                  Message
  ----     ------     ----                   ----                                                  -------
  Normal   Scheduled  5m31s                  default-scheduler                                     Successfully assigned spr-ops/vault-helm-0 to k8s-openebs-node-202003110536-1-1b
  Normal   Pulled     5m30s                  kubelet, k8s-openebs-node-202003110536-1-1b  Container image "vault:1.3.2" already present on machine
  Normal   Created    5m30s                  kubelet, k8s-openebs-node-202003110536-1-1b  Created container
  Normal   Started    5m30s                  kubelet, k8s-openebs-node-202003110536-1-1b  Started container
  Warning  Unhealthy  28s (x100 over 5m25s)  kubelet, k8s-openebs-node-202003110536-1-1b  Readiness probe failed: Error checking seal status: Get http://127.0.0.1:8200/v1/sys/seal-status: dial tcp 127.0.0.1:8200: connect: connection refused
Trenthani commented 4 years ago

I'm getting the same connection refused error as soon as I add in the consul backend to my vault helm deployment. I'll keep watch for a resolution. In my case this was working along with the seal stanza with a transit key however both of these configs have broken in the last week. I changed the CNI provider to calico but I'm not sure this is related...

jhonsfran1165 commented 4 years ago

I'm getting the same connection refused error as well :/

Trenthani commented 4 years ago

My fix was network related, I switched back to flannel and and used host-gw as the network type. I tested the issue by switch to a standalone config ti rule out any other config issues. This was all in a poc env so retaining data was not a concern.

RicoToothless commented 4 years ago

I used same version chart and just change vault docker image 1.4.0 => 1.3.1 (only test 1.3.1, not sure others 1.3.X version) it look like solved issue

wahyudibo commented 4 years ago

Hi, i encountered this connection error as well when trying to install vault HA mode with consul as the backend. I managed to get the consul server running but looks like vault having trouble connecting to consul server. These are the version that i use for both consul and vault: CHART: consul-0.21.0 APP_VERSION 1.7.3 CHART: vault-0.6.0 APP_VERSION 1.4.2

Guide that i use for installing: https://www.vaultproject.io/docs/platform/k8s/helm/run

vault-0 logs

WARNING! Unable to read storage migration status.
2020-06-14T21:50:27.745Z [INFO]  proxy environment: http_proxy= https_proxy= no_proxy=
2020-06-14T21:50:27.746Z [WARN]  storage migration check error: error="Get http://192.168.1.210:8500/v1/kv/vault/core/migration: dial tcp 192.168.1.210:8500: connect: connection refused"

vault-helm-values.yml

server:
  affinity: ""
  ha:
    enabled: true
    config: |
      ui = true
      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }
      storage "consul" {
        path = "vault/"
        address = "HOST_IP:8500"
      }

any advice for getting through this problem ? Thanks in advance

rdeb22 commented 4 years ago

Hi @jasonodonnell

Now vault seems to be running and unsealed but when I try to login I am seeing this error. local node not active but active cluster node not found local node not active but active cluster node not found

balajimejari commented 3 years ago

I got the same issue . $ oc exec -it vault-0 vault status Error checking seal status: Get "http://127.0.0.1:8200/v1/sys/seal-status": dial tcp 127.0.0.1:8200: connect: connection refused command terminated with exit code 1 when I look into pod logs I can see below 2020-11-04T12:23:17.945Z [WARN] storage migration check error: error="Get "http://10.160.225.18:8500/v1/kv/vault/core/migration": dial tcp 10.160.225.18:8500: connect: connection refused" So what I understood is , 10.160.225.18(HOST_IP) is my worker node where consul server pod is running , vault is not connecting to consul server with HOST_IP with 8500 port number , and below is my values .yaml storage "consul" { path = "vault/" address = "HOST_IP:8500"

what the work around I did was , I Changed HOST_IP:8500 to my consul SERVICE_IP , hence it is headless service , service ip not generated so I have given my consul service name , in my case my consul service name is "consul-server" balaji@DESKTOP-O8C6N39:~/vault$ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE consul-dns ClusterIP 172.21.33.6 53/TCP,53/UDP 2d consul-server ClusterIP None 8500/TCP,8301/TCP,8301/UDP,8302/TCP,8302/UDP,8300/TCP,8600/TCP

values.yaml

storage "consul" { path = "vault" address = "consul-server:8500"

Then vault was deployed and working fine SO finally my vault HA with backend consul storage is working perfectly .

PoojaSanaka commented 3 years ago

Hi @jasonodonnell , I am using consul azure managed app server and i have installed consul agent on aks.

kubectl get svc -n consul                                    
NAME                          TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
consul-connect-injector-svc   ClusterIP   10.0.252.97   <none>        443/TCP   3d13h
consul-controller-webhook     ClusterIP   10.0.169.80   <none>        443/TCP   3d13h

as i am using consul agent i donot see consul-server running . Helm chart for consul config

global:
  enabled: false
  name: consul
  datacenter: dc1
  acls:
    manageSystemACLs: true
    bootstrapToken:
      secretName: XXX-sandbox-managed-app-bootstrap-token
      secretKey: token
  gossipEncryption:
    secretName: XXX-sandbox-managed-app-hcs
    secretKey: gossipEncryptionKey
  tls:
    enabled: true
    enableAutoEncrypt: true
    caCert:
      secretName: XXX-sandbox-managed-app-hcs
      secretKey: caCert
externalServers:
  enabled: true
  hosts:
    ['XXX.az.hashicorp.cloud']
  httpsPort: 443
  useSystemRoots: true
  k8sAuthMethodHost: https://XXX.uksouth.azmk8s.io:443
client:
  enabled: true
  # If you are using Kubenet in your AKS cluster (the default network),
  # uncomment the line below.
  # exposeGossipPorts: true
  join:
    ['XXX.az.hashicorp.cloud']
connectInject:
  enabled: true
controller:
  enabled: true

Helm chart for vault config

ui:
  enabled: true
  serviceType: LoadBalancer

server:
  ingress:
    enabled: true
    extraPaths:
      - path: /
        backend:
          serviceName: vault-ui
          servicePort: 8200
    hosts:
      - host: something.com
  ha:
    enabled: true
    config: |
      ui = true

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
      }

      storage "consul" {
        path = "vault/"
        scheme = "https"
        address = "HOST_IP:8500"
      }

Error in vault pod which is unable to connect to consul agent.

kubectl logs vault-0 -n vault 

WARNING! Unable to read storage migration status.
2021-06-28T08:13:13.041Z [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
2021-06-28T08:13:13.042Z [WARN]  storage migration check error: error="Get "https://127.0.0.1:8500/v1/kv/vault/core/migration": dial tcp 127.0.0.1:8500: connect: connection refused"

i am not sure if some configuration is missed in consul helm chart as i do not see any service running on port 8500 in consul namespace.

Any suggestion would be much appreciated.

Thanks, pooja

salmanb commented 3 years ago

I was able to get this working by deploying consul server as a deployments of its own. I didn't enable consul-client because I don't want a deamonset deployed. This uses agent tokens, and consul policies (might not be required depending on how you deploy).

Instead, in the vault chart, I set these in the appropriate place:

extraInitContainers:
    - name: consul-config-writer
      image: "alpine"
      command: [sh, -c]
      resources:
        requests:
          memory: 256Mi
          cpu: 250m                                                                                                                                                                                                                                                                                                          
        limits:
          memory: 256Mi
          cpu: 250m
      args:
        - 'cd /consul-config && echo "{ \"primary_datacenter\": \"dc1\", \"acl\" : { \"enabled\": true, \"default_policy\": \"allow\", \"down_policy\": \"extend-cache\", \"enable_token_persistence\": true, \"tokens\": { \"default\": \"<agent_token>\" } } }" | tee agent.json && ls -l /consul-config'
      volumeMounts:
        - name: consul-config
          mountPath: /consul-config

# extraContainers is a list of sidecar containers. Specified as a YAML list.                                                                                                                                                                                                                                               
  extraContainers:    
    - name: consul    
      image: hashicorp/consul:1.10.0    
      volumeMounts:    
        - name: consul-config    
          mountPath: /consul-config    
      args:    
        - /bin/consul       
        - agent    
        - -join    
        - consul-consul-server    
        - -data-dir=/tmp/consul    
        - -encrypt    
        - <gossip_key>    
        - -config-file=/consul-config/agent.json    

        storage "consul" {
           path = "vault"
           address = "127.0.0.1:8500"
       }

My vault and consul pods are in the same namespace. If you're using different namespaces, then you'll probably need to provide the full service FQDN for your consul agent to join.