hashicorp / vault-helm

Helm chart to install Vault and other associated components.
Mozilla Public License 2.0
1.09k stars 880 forks source link

The vault service in HA mode with raft storage does not support loadbalancing. #457

Open anoncam opened 3 years ago

anoncam commented 3 years ago

Describe the bug The Vault service in HA mode does not support loadbalancing.

To Reproduce Steps to reproduce the behavior:

  1. Install chart in HA mode on an HA Kubernetes Cluster
  2. Configure Istio properly (an entirely separate set of issues not related to this issue...see Istio sidecar annotations in the values file), and create a VirtualService for Vault.
  3. curl or use the vault cli to enable raft snapshots

Expected behavior HA capability to interact with raft storage.


Chart values: (vault config)

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
  name: vault
  namespace: vault
  targetNamespace: vault
  releaseName: vault
  interval: 10m
      chart: chart
        kind: GitRepository
        name: vault
      retries: 5
      retries: 5
      remediateLastFailure: true
    cleanupOnFail: true
    timeout: 10m
    cleanupOnFail: false

    # Vault Helm Chart Value Overrides
      enabled: true
      tlsDisable: true
        - name: private-registry

      enabled: false
      # Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
        repository: "registry1.dso.mil/ironbank/hashicorp/vault/vault-k8s"
        tag:  v0.6.0

            memory: 256Mi
            cpu: 250m
            memory: 256Mi
            cpu: 250m


        enabled: true
        size: 50Gi
        mountPath: "/vault/data"
        accessMode: ReadWriteOnce

      - /bin/sh
      - -c
      - 'echo "libevmulti_init: Ready " > /tmp/cloudhsm_client_start.log'

      # the following annotations are requied to exempt traffic from the raft protocol 
      # otherwise, envoy side-car will interrupt raft-member to raft-member tls communications
        traffic.sidecar.istio.io/excludeInboundPorts: "8201"
        traffic.sidecar.istio.io/excludeOutboundPorts: "8201"
        # Enterprise Image - license required
        repository: "registry1.dsop.io/ironbank/hashicorp/turbog"
        tag: "1.6.1-hsm-cloudhsm"
        # tag: "1.5.0_ent" 

        # repository: "registry1.dsop.io/ironbank/hashicorp/secure-secrets-management/vault-enterprise"
        # tag: "1.5.3"

      # The Following Resource Limits are in line with node requirements in the
      # Vault Reference Architecture for a Small Cluster

      # PROD
          memory: 8Gi
          cpu: 2000m
          memory: 16Gi
          cpu: 2000m

        enabled: false

      # For HA configuration and because we need to manually init the vault,
      # we need to define custom readiness/liveness Probe settings
        enabled: false
        path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204" #orinal setting
        enabled: false
        path: "/v1/sys/health?standbyok=true"
        initialDelaySeconds: 60

      # extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
      # used to include variables required for auto-unseal.
        VAULT_API_ADDR: http://vault-internal:8200

      # This configures the Vault Statefulset to create a PVC for audit logs.
      # See https://www.vaultproject.io/docs/audit/index.html to know more
        enabled: true

      # Run Vault in "HA" mode.
        enabled: true
        replicas: 5

          enabled: true
          setNodeId: true

          # config file encrypted in vault-values.enc.yaml 
          config: |
            ui = true
            disable_mlock = true
            log_level = "Trace"

            listener "tcp" {
                address = "[::]:8200"
                cluster_address = "[::]:8201" 
                tls_disable = 1

            storage "raft" {
                path = "/vault/data"
                retry_join {
                  leader_api_addr = "http://vault-0.vault-internal:8200"
                retry_join {
                  leader_api_addr = "http://vault-1.vault-internal:8200"
                retry_join {
                  leader_api_addr = "http://vault-2.vault-internal:8200"
                retry_join {
                  leader_api_addr = "http://vault-3.vault-internal:8200"
                retry_join {
                  leader_api_addr = "http://vault-4.vault-internal:8200"

            seal "awskms" {
                region     = "us-gov-west-1"
                kms_key_id = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
                endpoint   = "https://kms.us-gov-west-1.amazonaws.com"

            entropy "seal" {
              mode = "augmentation"

            service_registration "kubernetes" {}

    # Vault UI
      enabled: true
      serviceType: "ClusterIP"
      serviceNodePort: null
      externalPort: 8200

Additional context

The Istio VirtualService

apiVersion: v1
- apiVersion: networking.istio.io/v1beta1
  kind: VirtualService
      kubectl.kubernetes.io/last-applied-configuration: |
    creationTimestamp: "2021-01-25T20:52:29Z"
    generation: 3
      kustomize.toolkit.fluxcd.io/checksum: 10e40553b58e53b5e14ee98481ade71bbd2a77d6
      kustomize.toolkit.fluxcd.io/name: vault-deploy
      kustomize.toolkit.fluxcd.io/namespace: vault
      owner: vault
    - apiVersion: networking.istio.io/v1beta1
      fieldsType: FieldsV1
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
            .: {}
            f:kustomize.toolkit.fluxcd.io/checksum: {}
            f:kustomize.toolkit.fluxcd.io/name: {}
            f:kustomize.toolkit.fluxcd.io/namespace: {}
            f:owner: {}
          .: {}
          f:gateways: {}
          f:hosts: {}
          f:http: {}
      manager: kubectl-client-side-apply
      operation: Update
      time: "2021-01-25T20:52:29Z"
    name: vault
    namespace: vault
    resourceVersion: "8111567"
    selfLink: /apis/networking.istio.io/v1beta1/namespaces/vault/virtualservices/vault
    uid: 22c8d5af-ab80-470a-9c83-0034e65d41a2
    - main.istio-system.svc.cluster.local
    - our.sensitive.domain
    - route:
      - destination:
          host: vault-active.vault.svc.cluster.local
            number: 8200
kind: List
  resourceVersion: ""
  selfLink: ""

Once the virtual service was changed to point to vault-active everything worked as expected, but we have 4 stale pods, which isn't really HA anymore.

anoncam commented 3 years ago

To clarify: the issue only occurs when interacting with the raft API endpoints.

ngarafol commented 3 years ago

Not sure I follow you, so bear with me: You have 5 nodes in HA cluster, and you use Istio and wanted to what? LB to all 5 at once? That will not work by vault design - there is one active node (leader) and rest of nodes are standby nodes forwarding requests to active node.

To be highly available, one of the Vault server nodes grabs a lock within the data store. The successful server node then becomes the active node; all other nodes become standby nodes. At this point, if the standby nodes receive a request, they will either forward the request or redirect the client depending on the current configuration and state of the cluster -- see the sections below for details. Due to this architecture, HA does not enable increased scalability. In general, the bottleneck of Vault is the data store itself, not Vault core. For example: to increase the scalability of Vault with Consul, you would generally scale Consul instead of Vault.

source: https://www.vaultproject.io/docs/concepts/ha#high-availability-mode-ha

reidlai commented 3 years ago

Not sure I follow you, so bear with me: You have 5 nodes in HA cluster, and you use Istio and wanted to what? LB to all 5 at once? That will not work by vault design - there is one active node (leader) and rest of nodes are standby nodes forwarding requests to active node.

To be highly available, one of the Vault server nodes grabs a lock within the data store. The successful server node then becomes the active node; all other nodes become standby nodes. At this point, if the standby nodes receive a request, they will either forward the request or redirect the client depending on the current configuration and state of the cluster -- see the sections below for details. Due to this architecture, HA does not enable increased scalability. In general, the bottleneck of Vault is the data store itself, not Vault core. For example: to increase the scalability of Vault with Consul, you would generally scale Consul instead of Vault.

source: https://www.vaultproject.io/docs/concepts/ha#high-availability-mode-ha

Based on vault chart template, the chart should use server.ha.replicas to set in stateful set if server.dev.enabled and server.standalone.enabled are false. Suppose your viewpoint is correct. At least we should see replicas in server stateful set should be the same as the number of server.replicas. I am struggling how to change stateful set replicas using this helm chart.

evhiness commented 3 years ago

Not sure I follow you, so bear with me: You have 5 nodes in HA cluster, and you use Istio and wanted to what? LB to all 5 at once? That will not work by vault design - there is one active node (leader) and rest of nodes are standby nodes forwarding requests to active node. To be highly available, one of the Vault server nodes grabs a lock within the data store. The successful server node then becomes the active node; all other nodes become standby nodes. At this point, if the standby nodes receive a request, they will either forward the request or redirect the client depending on the current configuration and state of the cluster -- see the sections below for details. Due to this architecture, HA does not enable increased scalability. In general, the bottleneck of Vault is the data store itself, not Vault core. For example: to increase the scalability of Vault with Consul, you would generally scale Consul instead of Vault. source: https://www.vaultproject.io/docs/concepts/ha#high-availability-mode-ha

Based on vault chart template, the chart should use server.ha.replicas to set in stateful set if server.dev.enabled and server.standalone.enabled are false. Suppose your viewpoint is correct. At least we should see replicas in server stateful set should be the same as the number of server.replicas. I am struggling how to change stateful set replicas using this helm chart.

Changing ha.replicas works for me to scale the service. The template does indeed use that value correctly as can be seen in the repository. I think the OP is confusing horizontal scalability with high availability.