Vault 1.16.x helm deployment with MySQL storage: pods are being in separate groups

ValeriiVozniuk commented 4 months ago

Describe the bug Running vault operator members in pods shows that pods are being in independent groups, and sometimes are not able to find the active cluster.

To Reproduce Steps to reproduce the behavior:

Deploy Vault 1.16.x via Helm chart with MySQL backend instead of Raft. Use 1 pod per node (6 in my case)
sh into each pod, run vault login, provide root token upon request.
If step above is successful, run vault operator members.

Expected behavior No errors while running commands above, and command in step 3 shows all 6 "nodes"

Actual behavior

vault login sometimes produces an error
```
Error making API request.
```

URL: GET http://127.0.0.1:8200/v1/sys/ha-status Code: 500. Errors:

local node not active but active cluster node not found
```
2. `vault operator members` shows that "nodes" are divided in groups, for example
```
Host Name API Address Cluster Address Active Node Version Upgrade Version Redundancy Zone Last Echo

vault-3 http://10.42.2.24:8200 https://vault-3.vault-internal:8201 false 1.16.3 n/a n/a 2024-05-31T10:01:35Z vault-0 http://10.42.4.21:8200 https://vault-0.vault-internal:8201 false 1.16.3 n/a n/a 2024-05-31T10:01:31Z vault-1 http://10.42.5.22:8200 https://vault-1.vault-internal:8201 true 1.16.3 n/a n/a n/a

Host Name API Address Cluster Address Active Node Version Upgrade Version Redundancy Zone Last Echo

vault-2 http://10.42.0.27:8200 https://vault-2.vault-internal:8201 true 1.16.3 n/a n/a n/a

Host Name API Address Cluster Address Active Node Version Upgrade Version Redundancy Zone Last Echo

vault-4 http://10.42.1.27:8200 https://vault-4.vault-internal:8201 true 1.16.3 n/a n/a n/a

Host Name API Address Cluster Address Active Node Version Upgrade Version Redundancy Zone Last Echo

vault-5 http://10.42.3.21:8200 https://vault-5.vault-internal:8201 true 1.16.3 n/a n/a n/a

Environment:

Vault Server Version (retrieve with vault status): 1.16.3
Vault CLI Version (retrieve with vault version): 1.16.3
Server Operating System/Architecture:

Vault server configuration file(s):

injector:
  enabled: false
csi:
  enabled: true
  agent:
    image:
      repository: "hashicorp/vault"
      tag: "1.16.3"
server:
  affinity: ""
  image:
    repository: "hashicorp/vault"
    tag: "1.16.3"
  ha:
    enabled: true
    replicas: 1
    config: |
      ui = true

      listener "tcp" {
        tls_disable = 1
        address = "[::]:8200"
        cluster_address = "[::]:8201"
        # Enable unauthenticated metrics access (necessary for Prometheus Operator)
        telemetry {
          unauthenticated_metrics_access = "true"
        }
      }

      service_registration "kubernetes" {}

      telemetry {
        prometheus_retention_time = "30s"
        disable_hostname = true
      }

  extraVolumes:
    - type: secret
      name: vault-mysql

  extraArgs: "-config=/vault/userconfig/vault-mysql/mysql-config.hcl"

  postStart:
    - /bin/sh
    - -c
    - if [ ! -z ${TOKENS} ]; then sleep 30; token1=$(echo "${TOKENS}" | cut -d',' -f1); token2=$(echo "${TOKENS}" | cut -d',' -f2); vault operator unseal ${token1}; vault operator unseal ${token2}; fi

Additional context The problem we started to see after updated to 1.16.2 from 1.15.6 that sometimes vault pods are starting, but produce errors, and are not able to serve secrets to clients. Or serving stale data, not seeing newly enabled auth/updated access policy rules/etc. Upon looking into pods logs, we saw different errors like

2024-05-31T09:38:51.999Z [ERROR] core: forward request error: error="error during forwarding RPC request"
2024-05-31T09:38:53.097Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp: lookup vault-5.vault-internal on 10.43.0.
10:53: no such host\""

where pods like vault-2 are trying to find vault-5 as here. Also we noted the issue above with vault login/vault operator members commands. Before updating to 1.16.2 we didn't have any of these issues.

More details about our architecture:

We have a "stretch" Kubernetes cluster which resides in 2 DCs, and we want Vault to serve the requests locally without going cross-DC. That's why we took mysql backend, as in case of network split Raft would not be able to survive with equal number of Vault pods on each side.
Backend in MariaDB Galera cluster with 4 nodes all-active mode. Vault pods are connecting to the DB to the local in DC node, and work with it.
vault status on all nodes is showing the correct Cluster Name/Cluster ID.
In case if any of the pods goes down, "reshuffling" is happening. For example, there was a group with 3 "nodes", vault-1, vault-3, vault-5. vault-4 pod went for restart, and now out of sudden group above breaks, and each of the "nodes" above is now "independent".

ValeriiVozniuk commented 4 months ago

I've made some tests with previous 1.15.6, and it is behaving a lot better. From the start it forms 5+1 nodes groups

01:~$ kubectl -n vault exec -it vault-1 -- vault operator members
Host Name    API Address              Cluster Address                        Active Node    Version    Upgrade Version    Redundancy Zone    Last Echo
---------    -----------              ---------------                        -----------    -------    ---------------    ---------------    ---------
vault-1      http://10.42.0.9:8200    https://vault-1.vault-internal:8201    true           1.15.6     n/a                n/a                n/a
01:~$ kubectl -n vault exec -it vault-3 -- vault operator members
Host Name    API Address               Cluster Address                        Active Node    Version    Upgrade Version    Redundancy Zone    Last Echo
---------    -----------               ---------------                        -----------    -------    ---------------    ---------------    ---------
vault-3      http://10.42.1.12:8200    https://vault-3.vault-internal:8201    true           1.15.6     n/a                n/a                n/a
vault-2      http://10.42.2.12:8200    https://vault-2.vault-internal:8201    false          1.15.6     n/a                n/a                2024-06-04T09:33:22Z
vault-4      http://10.42.3.13:8200    https://vault-4.vault-internal:8201    false          1.15.6     n/a                n/a                2024-06-04T09:33:22Z
vault-5      http://10.42.4.10:8200    https://vault-5.vault-internal:8201    false          1.15.6     n/a                n/a                2024-06-04T09:33:21Z
vault-0      http://10.42.5.13:8200    https://vault-0.vault-internal:8201    false          1.15.6     n/a                n/a                2024-06-04T09:33:26Z

And it feels like that even with database backend Vault tends to form non-even groups like with raft backend. 1.15.6 handles pods restarts better, at some point having all 6 pods in a single cluster, but then again splits to 5+1.

Any ideas why it is doing so, and unable to hold even number of nodes groups?

ValeriiVozniuk commented 3 months ago

Same with fresh release 1.17.1

01:~$ k exec -it vault-0 -- vault operator members
Host Name    API Address               Cluster Address                        Active Node    Version    Upgrade Version    Redundancy Zone    Last Echo
---------    -----------               ---------------                        -----------    -------    ---------------    ---------------    ---------
vault-0      http://10.42.3.31:8200    https://vault-0.vault-internal:8201    true           1.17.1     n/a                n/a                n/a
01:~$ k exec -it vault-1 -- vault operator members
Host Name    API Address               Cluster Address                        Active Node    Version    Upgrade Version    Redundancy Zone    Last Echo
---------    -----------               ---------------                        -----------    -------    ---------------    ---------------    ---------
vault-1      http://10.42.4.31:8200    https://vault-1.vault-internal:8201    true           1.17.1     n/a                n/a                n/a
01:~$ k exec -it vault-2 -- vault operator members
Host Name    API Address               Cluster Address                        Active Node    Version    Upgrade Version    Redundancy Zone    Last Echo
---------    -----------               ---------------                        -----------    -------    ---------------    ---------------    ---------
vault-2      http://10.42.0.37:8200    https://vault-2.vault-internal:8201    false          1.17.1     n/a                n/a                2024-06-27T08:45:27Z
vault-3      http://10.42.1.38:8200    https://vault-3.vault-internal:8201    false          1.17.1     n/a                n/a                2024-06-27T08:45:27Z
vault-5      http://10.42.2.32:8200    https://vault-5.vault-internal:8201    false          1.17.1     n/a                n/a                2024-06-27T08:45:27Z
vault-4      http://10.42.5.34:8200    https://vault-4.vault-internal:8201    true           1.17.1     n/a                n/a                n/a

hashicorp / vault

Vault 1.16.x helm deployment with MySQL storage: pods are being in separate groups #27301

vault-2 http://10.42.0.27:8200 https://vault-2.vault-internal:8201 true 1.16.3 n/a n/a n/a

vault-4 http://10.42.1.27:8200 https://vault-4.vault-internal:8201 true 1.16.3 n/a n/a n/a