The vault service in HA mode with raft storage does not support loadbalancing.

anoncam commented 3 years ago

Describe the bug The Vault service in HA mode does not support loadbalancing.

To Reproduce Steps to reproduce the behavior:

Install chart in HA mode on an HA Kubernetes Cluster
Configure Istio properly (an entirely separate set of issues not related to this issue...see Istio sidecar annotations in the values file), and create a VirtualService for Vault.
curl or use the vault cli to enable raft snapshots

Expected behavior HA capability to interact with raft storage.

Environment

Kubernetes version: RKE2
chart version 0.8.0

Chart values: (vault config)

  ---
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: vault
  namespace: vault
spec:
  targetNamespace: vault
  releaseName: vault
  interval: 10m
  chart:
    spec:
      chart: chart
      sourceRef:
        kind: GitRepository
        name: vault
  install:
    remediation:
      retries: 5
  upgrade:
    remediation:
      retries: 5
      remediateLastFailure: true
    cleanupOnFail: true
  rollback:
    timeout: 10m
    cleanupOnFail: false
  values:

    # Vault Helm Chart Value Overrides
    global:
      enabled: true
      tlsDisable: true
      imagePullSecrets:
        - name: private-registry

    injector:
      enabled: false
      # Use the Vault K8s Image https://github.com/hashicorp/vault-k8s/
      image:
        repository: "registry1.dso.mil/ironbank/hashicorp/vault/vault-k8s"
        tag:  v0.6.0

      resources:
          requests:
            memory: 256Mi
            cpu: 250m
          limits:
            memory: 256Mi
            cpu: 250m

    server:

      dataStorage:
        enabled: true
        size: 50Gi
        mountPath: "/vault/data"
        accessMode: ReadWriteOnce

      postStart:
      - /bin/sh
      - -c
      - 'echo "libevmulti_init: Ready " > /tmp/cloudhsm_client_start.log'

      # the following annotations are requied to exempt traffic from the raft protocol 
      # otherwise, envoy side-car will interrupt raft-member to raft-member tls communications
      annotations:  
        traffic.sidecar.istio.io/excludeInboundPorts: "8201"
        traffic.sidecar.istio.io/excludeOutboundPorts: "8201"
      image:
        # Enterprise Image - license required
        repository: "registry1.dsop.io/ironbank/hashicorp/turbog"
        tag: "1.6.1-hsm-cloudhsm"
        # tag: "1.5.0_ent" 

        #IB-Enterprise 
        # repository: "registry1.dsop.io/ironbank/hashicorp/secure-secrets-management/vault-enterprise"
        # tag: "1.5.3"

      # The Following Resource Limits are in line with node requirements in the
      # Vault Reference Architecture for a Small Cluster

      # PROD
      resources:
        requests:
          memory: 8Gi
          cpu: 2000m
        limits:
          memory: 16Gi
          cpu: 2000m

      ingress:
        enabled: false

      # For HA configuration and because we need to manually init the vault,
      # we need to define custom readiness/liveness Probe settings
      readinessProbe:
        enabled: false
        path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204" #orinal setting
      livenessProbe:
        enabled: false
        path: "/v1/sys/health?standbyok=true"
        initialDelaySeconds: 60

      # extraEnvironmentVars is a list of extra environment variables to set with the stateful set. These could be
      # used to include variables required for auto-unseal.
      extraEnvironmentVars:
        VAULT_API_ADDR: http://vault-internal:8200
        VAULT_ADDR:  http://127.0.0.1:8200

      # This configures the Vault Statefulset to create a PVC for audit logs.
      # See https://www.vaultproject.io/docs/audit/index.html to know more
      auditStorage:
        enabled: true

      # Run Vault in "HA" mode.
      ha:
        enabled: true
        replicas: 5

        raft:
          enabled: true
          setNodeId: true

          # config file encrypted in vault-values.enc.yaml 
          config: |
            ui = true
            disable_mlock = true
            log_level = "Trace"

            listener "tcp" {
                address = "[::]:8200"
                cluster_address = "[::]:8201" 
                tls_disable = 1
            }

            storage "raft" {
                path = "/vault/data"
                retry_join {
                  leader_api_addr = "http://vault-0.vault-internal:8200"
                }
                retry_join {
                  leader_api_addr = "http://vault-1.vault-internal:8200"
                }
                retry_join {
                  leader_api_addr = "http://vault-2.vault-internal:8200"
                }
                retry_join {
                  leader_api_addr = "http://vault-3.vault-internal:8200"
                }
                retry_join {
                  leader_api_addr = "http://vault-4.vault-internal:8200"
                }
            }

            seal "awskms" {
                region     = "us-gov-west-1"
                kms_key_id = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
                endpoint   = "https://kms.us-gov-west-1.amazonaws.com"
            }

            entropy "seal" {
              mode = "augmentation"
            }

            service_registration "kubernetes" {}

    # Vault UI
    ui:
      enabled: true
      serviceType: "ClusterIP"
      serviceNodePort: null
      externalPort: 8200

Additional context

The Istio VirtualService

apiVersion: v1
items:
- apiVersion: networking.istio.io/v1beta1
  kind: VirtualService
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"networking.istio.io/v1beta1","kind":"VirtualService","metadata":{"annotations":{},"labels":{"kustomize.toolkit.fluxcd.io/checksum":"10e40553b58e53b5e14ee98481ade71bbd2a77d6","kustomize.toolkit.fluxcd.io/name":"vault-deploy","kustomize.toolkit.fluxcd.io/namespace":"vault","owner":"vault"},"name":"vault","namespace":"vault"},"spec":{"gateways":["main.istio-system.svc.cluster.local"],"hosts":["our.vault.url"],"http":[{"route":[{"destination":{"host":"vault-active.vault.svc.cluster.local","port":{"number":8200}}}]}]}}
    creationTimestamp: "2021-01-25T20:52:29Z"
    generation: 3
    labels:
      kustomize.toolkit.fluxcd.io/checksum: 10e40553b58e53b5e14ee98481ade71bbd2a77d6
      kustomize.toolkit.fluxcd.io/name: vault-deploy
      kustomize.toolkit.fluxcd.io/namespace: vault
      owner: vault
    managedFields:
    - apiVersion: networking.istio.io/v1beta1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:labels:
            .: {}
            f:kustomize.toolkit.fluxcd.io/checksum: {}
            f:kustomize.toolkit.fluxcd.io/name: {}
            f:kustomize.toolkit.fluxcd.io/namespace: {}
            f:owner: {}
        f:spec:
          .: {}
          f:gateways: {}
          f:hosts: {}
          f:http: {}
      manager: kubectl-client-side-apply
      operation: Update
      time: "2021-01-25T20:52:29Z"
    name: vault
    namespace: vault
    resourceVersion: "8111567"
    selfLink: /apis/networking.istio.io/v1beta1/namespaces/vault/virtualservices/vault
    uid: 22c8d5af-ab80-470a-9c83-0034e65d41a2
  spec:
    gateways:
    - main.istio-system.svc.cluster.local
    hosts:
    - our.sensitive.domain
    http:
    - route:
      - destination:
          host: vault-active.vault.svc.cluster.local
          port:
            number: 8200
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Once the virtual service was changed to point to vault-active everything worked as expected, but we have 4 stale pods, which isn't really HA anymore.

anoncam commented 3 years ago

To clarify: the issue only occurs when interacting with the raft API endpoints.

ngarafol commented 3 years ago

Not sure I follow you, so bear with me: You have 5 nodes in HA cluster, and you use Istio and wanted to what? LB to all 5 at once? That will not work by vault design - there is one active node (leader) and rest of nodes are standby nodes forwarding requests to active node.

To be highly available, one of the Vault server nodes grabs a lock within the data store. The successful server node then becomes the active node; all other nodes become standby nodes. At this point, if the standby nodes receive a request, they will either forward the request or redirect the client depending on the current configuration and state of the cluster -- see the sections below for details. Due to this architecture, HA does not enable increased scalability. In general, the bottleneck of Vault is the data store itself, not Vault core. For example: to increase the scalability of Vault with Consul, you would generally scale Consul instead of Vault.

source: https://www.vaultproject.io/docs/concepts/ha#high-availability-mode-ha

reidlai commented 3 years ago

Not sure I follow you, so bear with me: You have 5 nodes in HA cluster, and you use Istio and wanted to what? LB to all 5 at once? That will not work by vault design - there is one active node (leader) and rest of nodes are standby nodes forwarding requests to active node.

To be highly available, one of the Vault server nodes grabs a lock within the data store. The successful server node then becomes the active node; all other nodes become standby nodes. At this point, if the standby nodes receive a request, they will either forward the request or redirect the client depending on the current configuration and state of the cluster -- see the sections below for details. Due to this architecture, HA does not enable increased scalability. In general, the bottleneck of Vault is the data store itself, not Vault core. For example: to increase the scalability of Vault with Consul, you would generally scale Consul instead of Vault.

source: https://www.vaultproject.io/docs/concepts/ha#high-availability-mode-ha

Based on vault chart template, the chart should use server.ha.replicas to set in stateful set if server.dev.enabled and server.standalone.enabled are false. Suppose your viewpoint is correct. At least we should see replicas in server stateful set should be the same as the number of server.replicas. I am struggling how to change stateful set replicas using this helm chart.

evhiness commented 3 years ago

Not sure I follow you, so bear with me: You have 5 nodes in HA cluster, and you use Istio and wanted to what? LB to all 5 at once? That will not work by vault design - there is one active node (leader) and rest of nodes are standby nodes forwarding requests to active node. To be highly available, one of the Vault server nodes grabs a lock within the data store. The successful server node then becomes the active node; all other nodes become standby nodes. At this point, if the standby nodes receive a request, they will either forward the request or redirect the client depending on the current configuration and state of the cluster -- see the sections below for details. Due to this architecture, HA does not enable increased scalability. In general, the bottleneck of Vault is the data store itself, not Vault core. For example: to increase the scalability of Vault with Consul, you would generally scale Consul instead of Vault. source: https://www.vaultproject.io/docs/concepts/ha#high-availability-mode-ha

Based on vault chart template, the chart should use server.ha.replicas to set in stateful set if server.dev.enabled and server.standalone.enabled are false. Suppose your viewpoint is correct. At least we should see replicas in server stateful set should be the same as the number of server.replicas. I am struggling how to change stateful set replicas using this helm chart.

Changing ha.replicas works for me to scale the service. The template does indeed use that value correctly as can be seen in the repository. I think the OP is confusing horizontal scalability with high availability.

hashicorp / vault-helm

The vault service in HA mode with raft storage does not support loadbalancing. #457