hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
31.33k stars 4.23k forks source link

Error handshaking cluster connection, tls : internal error #16010

Open 0xArKrant opened 2 years ago

0xArKrant commented 2 years ago

Environment:

Vault Config File:

LEADER CONFIGURATION :

storage "raft" {
   path    = "/opt/raft/data/"
   node_id = "node_1.[REDACTED]"
}

listener "tcp" {
   address = "node_1.[REDACTED]:8200"
   cluster_address = "node_1.[REDACTED]:8201"
   tls_disable = "true"
}

FAILING NODE CONFIGURATION :

storage "raft" {
   path    = "/opt/raft/data/"
   node_id = "node_6.[REDACTED]"
}

listener "tcp" {
   address = "node_6.[REDACTED]:8200"
   cluster_address = "node_6.[REDACTED]:8201"
   tls_disable = "true"
}

Startup Log Output: LEADER

Jun 16 09:40:26 node_1.[REDACTED] vault[533555]: 2022-06-16T09:40:26.493+0200 [ERROR] storage.raft: failed to heartbeat to: peer=node_6.[REDACTED]:8201 error="remote error: tls: internal error"
Jun 16 09:40:29 node_1.[REDACTED] vault[533555]: 2022-06-16T09:40:29.082+0200 [DEBUG] core.cluster-listener: creating rpc dialer: address=node_6.[REDACTED]:8201 alpn=raft_storage_v1 host=raft-23aeea35-07f4-04b2-5458-b58af65a327b
Jun 16 09:40:29 node_1.[REDACTED] vault[533555]: 2022-06-16T09:40:29.083+0200 [ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter node_6.[REDACTED] node_6..[REDACTED]:8201}" error="remote error: tls: internal error"

FAILING NODE

Jun 16 09:37:24 node_6.[REDACTED] vault[114533]: 2022-06-16T09:37:24.149+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 09:37:24 node_6.[REDACTED] vault[114533]: 2022-06-16T09:37:24.149+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"

Expected Behavior: The node_6 successfully join the node_1 who's the leader. And sync the vault storage to the new node Like the other 4 nodes (see bellow)

Node                          Address                            State       Voter
----                          -------                            -----       -----
node_1.[REDACTED]            node_1.[REDACTED]:8201                      leader      true
node_2.[REDACTED]            node_2.[REDACTED]:8201                      follower    true
node_3.[REDACTED]            node_3.[REDACTED]:8201                      follower    true
node_5.[REDACTED]            node_5.[REDACTED]:8201                      follower    true
node_4.[REDACTED]            node_4.[REDACTED]:8201                      follower    true
node_6.[REDACTED]            node_6.[REDACTED]:8201                      follower    false

Actual Behavior: I've been working on this issue for a week now, with tls certificates and without, it is the same error. And it's completely random, sometimes it work (like node_2/3/4/5) sometimes it doesn't (like node_6) If reinstall vault on another node (node_2 for example) their is a possibilities i have the same issue, as i said, it's random But i haven't been able to find the reason yet, so i'm asking for help The node_6 stay sealed with TLS error and appears has non-voter, unable to use it.

Steps to Reproduce: I followed the steps in this TUTORIAL

Important Factoids: No internet, only local

References: VERY OLD ISSUE STACK OVERFLOW ISSUE DISCUSS HASHICORP

cipherboy commented 2 years ago

@0xArKrant I don't have any specific thoughts for this, but note two things that might help you find next steps:

Here, from your logs, we see that the leader is attempting to initiate a connection to the failing node, with the next protocol of raft_storage_v1. The failure is coming because this target node appears to lack knowledge of the raft_storage_v1 selected protocol and is thus erring out the TLS connection.

Maybe some more context from earlier in the failed node startup might give more context as to why we failed to provision this config? Does this occur only with 1.10.3 or do older versions (say, a 1.9.x series) also have the same error?

@ncabatoff Have you seen this before? My initial suspicion was wrong port in the config, but the fact that it works and randomly stops working afterwards is weird (with no apparent config changes). Also, it appears to be from the cluster-listener logger, which makes me think the config is indeed correct.

0xArKrant commented 2 years ago

@cipherboy Hi, thank you for answering. Currently we are using Vault v.0.10.4 (a very old one) and storage Consul v1.0.6 So we are migrating to a way greater version (1.10.3).

I will try setting up a full cluster with an earlier version than the latest (1.9 for example and update you with it)

Otherwise, here are the full log, from startup and joining :

Jun 16 17:00:42 node_6.[REDACTED] systemd[1]: vault.service: Succeeded.
Jun 16 17:00:42 node_6.[REDACTED] systemd[1]: Stopped Vault secret management tool.
Jun 16 17:03:38 node_6.[REDACTED] systemd[1]: Starting Vault secret management tool...
Jun 16 17:03:38 node_6.[REDACTED] systemd[1]: Started Vault secret management tool.
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]: ==> Vault server configuration:
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:              Api Address: [http://node_6.[REDACTED]:8200]http://node_6.[REDACTED]:8200
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:                      Cgo: disabled
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:          Cluster Address: [https://node_6.[REDACTED]:8201]https://node_6.[REDACTED]:8201
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:               Go Version: go1.17.9
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:               Listener 1: tcp (addr: "node_6.[REDACTED]:8200", cluster address: "node_6.[REDACTED]:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:                Log Level: debug
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:                    Mlock: supported: true, enabled: false
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:            Recovery Mode: false
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:                  Storage: raft (HA available)
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:                  Version: Vault v1.10.3
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]:              Version Sha: af866591ee60485f05d6e32dd63dde93df686dfb
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]: ==> Vault server started! Log data will stream in below:
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:38.128+0200 [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:38.140+0200 [DEBUG] storage.raft.fsm: time to open database: elapsed=11.643126ms path=/opt/raft/data/vault.db
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:38.149+0200 [DEBUG] core: set config: sanitized config="{\"api_addr\":\"[http://node_6.[REDACTED]:8200/]http://node_6.[REDACTED]:8200\",\"cache_size\":0,\"cluster_addr\":\"[http://node_6.[REDACTED]:8201/]http://node_6.[REDACTED]:8201\",\"cluster_cipher_suites\":\"\",\"cluster_name\":\"\",\"default_lease_ttl\":0,\"default_max_request_duration\":0,\"disable_cache\":false,\"disable_clustering\":false,\"disable_indexing\":false,\"disable_mlock\":true,\"disable_performance_standby\":false,\"disable_printable_check\":false,\"disable_sealwrap\":false,\"disable_sentinel_trace\":false,\"enable_response_header_hostname\":false,\"enable_response_header_raft_node_id\":false,\"enable_ui\":true,\"listeners\":[{\"config\":{\"address\":\"node_6.[REDACTED]:8200\",\"cluster_address\":\"node_6.[REDACTED]:8201\",\"tls_disable\":\"true\"},\"type\":\"tcp\"}],\"log_format\":\"unspecified\",\"log_level\":\"\",\"log_requests_level\":\"\",\"max_lease_ttl\":0,\"pid_file\":\"\",\"plugin_directory\":\"\",\"raw_storage_endpoint\":false,\"seals\":[{\"disabled\":false,\"type\":\"shamir\"}],\"storage\":{\"cluster_addr\":\"[http://node_6.[REDACTED]:8201/]http://node_6.[REDACTED]:8201\",\"disable_clustering\":false,\"redirect_addr\":\"[http://node_6.[REDACTED]:8200/]http://node_6.[REDACTED]:8200\",\"type\":\"raft\"}}"
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:38.150+0200 [DEBUG] storage.cache: creating LRU cache: size=0
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:38.150+0200 [INFO]  core: Initializing versionTimestamps for core
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:38.150+0200 [DEBUG] cluster listener addresses synthesized: cluster_addresses=[IP REDACTED NODE 6:8201]
Jun 16 17:03:38 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:38.150+0200 [DEBUG] would have sent systemd notification (systemd not present): notification=READY=1
Jun 16 17:03:39 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:39.972+0200 [INFO]  core: security barrier not initialized
Jun 16 17:03:39 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:39.972+0200 [INFO]  core: seal configuration missing, not initialized
Jun 16 17:03:40 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:40.680+0200 [INFO]  core: security barrier not initialized
Jun 16 17:03:40 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:40.680+0200 [INFO]  core: attempting to join possible raft leader node: leader_addr=http://node_1.[REDACTED]:8200
Jun 16 17:03:41 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:41.314+0200 [INFO]  core: security barrier not initialized
Jun 16 17:03:42 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:42.037+0200 [DEBUG] core: unseal key supplied: migrate=false
Jun 16 17:03:42 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:42.037+0200 [INFO]  core: security barrier not initialized
Jun 16 17:03:42 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:42.037+0200 [DEBUG] core: cannot unseal, not enough keys: keys=1 threshold=3 nonce=962e8434-841d-5022-9684-8564b0713139
Jun 16 17:03:42 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:42.037+0200 [INFO]  core: security barrier not initialized
Jun 16 17:03:47 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:47.738+0200 [DEBUG] core: unseal key supplied: migrate=false
Jun 16 17:03:47 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:47.738+0200 [INFO]  core: security barrier not initialized
Jun 16 17:03:47 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:47.738+0200 [DEBUG] core: cannot unseal, not enough keys: keys=2 threshold=3 nonce=962e8434-841d-5022-9684-8564b0713139
Jun 16 17:03:47 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:47.738+0200 [INFO]  core: security barrier not initialized
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.552+0200 [DEBUG] core: unseal key supplied: migrate=false
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.552+0200 [INFO]  core: security barrier not initialized
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.567+0200 [DEBUG] core: starting cluster listeners
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.567+0200 [INFO]  core.cluster-listener.tcp: starting listener: listener_address=IP REDACTED NODE 6:8201
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.567+0200 [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=IP REDACTED NODE 6:8201
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.579+0200 [INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"node_6.[REDACTED]\", NotifyCh:(chan<- bool)(0xc0000a1ab0), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0xc00053aa80), NoSnapshotRestoreOnStart:true, skipStartup:false}"
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.590+0200 [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:node_1.[REDACTED] Address:node_1.[REDACTED]:8201} {Suffrage:Voter ID:node_2.[REDACTED] Address:node_2.[REDACTED]:8201} {Suffrage:Voter ID:node_3 Address:node_3.[REDATED]:8201} {Suffrage:Voter ID:node_5.[REDACTED] Address:node_5.[REDACTED]:8201} {Suffrage:Voter ID:node_4.[REDACTED] Address:node_4.[REDACTED]:8201} {Suffrage:Nonvoter ID:node_6.[REDACTED] Address:node_6.[REDACTED]:8201}]"
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.590+0200 [INFO]  core: security barrier not initialized
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.590+0200 [INFO]  storage.raft: entering follower state: follower="Node at node_6.[REDACTED]:8201 [Follower]" leader=
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.674+0200 [DEBUG] core.cluster-listener: performing server cert lookup
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.688+0200 [DEBUG] storage.raft.raft-net: accepted connection: local-address=node_6.[REDACTED]:8201 remote-address=IP REDACTED NODE LEADER:48152
Jun 16 17:03:53 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:53.689+0200 [WARN]  storage.raft: failed to get previous log: previous-index=8741 last-index=1 error="log not found"
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.365+0200 [DEBUG] core.cluster-listener: performing server cert lookup
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.394+0200 [DEBUG] storage.raft.raft-net: accepted connection: local-address=node_6.[REDACTED]:8201 remote-address=IP REDACTED NODE LEADER:48154
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.603+0200 [WARN]  core: cluster listener is already started
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.617+0200 [DEBUG] storage.raft: raft already started, not setting up cluster
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.617+0200 [INFO]  core: writing raft TLS keyring to storage
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.617+0200 [ERROR] core: error writing raft TLS keyring: error="node is not the leader"
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.617+0200 [INFO]  core: stopping raft server
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.623+0200 [ERROR] storage.raft.raft-net: failed to accept connection: error="Raft RPC layer closed"
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.628+0200 [ERROR] core: failed to unseal: error="node is not the leader"
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.628+0200 [ERROR] storage.raft.raft-net: failed to decode incoming command: error="transport shutdown"
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.700+0200 [ERROR] storage.raft.raft-net: failed to decode incoming command: error="transport shutdown"
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.798+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.798+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.880+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.880+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.998+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:54 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:54.998+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.132+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.132+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.302+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.302+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.312+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.312+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.528+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.528+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.918+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:55 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:55.918+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:56 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:56.031+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:56 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:56.031+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:56 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:56.638+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:56 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:56.638+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"
Jun 16 17:03:56 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:56.658+0200 [WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
Jun 16 17:03:56 node_6.[REDACTED] vault[125629]: 2022-06-16T17:03:56.658+0200 [DEBUG] core.cluster-listener: error handshaking cluster connection: error="unsupported protocol"

I use ansible jobs to automatize, install, uninstall, and joining a node. So it is the same operation on working nodes too.

cipherboy commented 2 years ago

@0xArKrant One additional question so I can make sure I understand this -- this is on a homogeneous cluster (all updated to v1.10.3) or is this on a split versioned cluster (v0.10.4 primary talking to a newer v1.10.3 secondary)?

0xArKrant commented 2 years ago

@cipherboy yes exactly, this is the same version, deployed the same way. All on latest version. The leader node (the first created) will get old secrets using vault migrate from our old consul backend.

raskchanky commented 2 years ago

@0xArKrant Can you share the command(s) you're using to start/unseal/join nodes in the cluster, as well as any commands that were run during the time those logs were being written? Some of the log messages make me wonder how this node is being joined to the existing cluster.

To make my point a little bit more concrete, below are the logs from the standby node of a 2 node test cluster I started on my local machine, also using raft storage. Notice how they differ from yours, specifically around joining the cluster.

==> Vault server configuration:

             Api Address: http://127.0.0.2:8200
                     Cgo: disabled
         Cluster Address: https://127.0.0.2:8201
              Go Version: go1.18.3
              Listener 1: tcp (addr: "127.0.0.2:8200", cluster address: "127.0.0.2:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
               Log Level: trace
                   Mlock: supported: false, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.12.0-dev1, built 2022-06-15T14:50:44Z
             Version Sha: 4762806f166adcc8c9baf50702c336db90987c56

==> Vault server started! Log data will stream in below:

2022-06-16T15:42:44.363-0700 [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
2022-06-16T15:42:44.387-0700 [DEBUG] storage.raft.fsm: time to open database: elapsed=23.781753ms path=/Users/josh/tmp/vault-cluster/node-2/vault.db
2022-06-16T15:42:44.484-0700 [DEBUG] core: set config: sanitized config="{\"api_addr\":\"http://127.0.0.2:8200\",\"cache_size\":10,\"cluster_addr\":\"http://127.0.0.2:8201\",\"cluster_cipher_suites\":\"\",\"cluster_name\":\"\",\"default_lease_ttl\":0,\"default_max_request_duration\":0,\"disable_cache\":false,\"disable_clustering\":false,\"disable_indexing\":false,\"disable_mlock\":true,\"disable_performance_standby\":false,\"disable_printable_check\":false,\"disable_sealwrap\":false,\"disable_sentinel_trace\":false,\"enable_response_header_hostname\":false,\"enable_response_header_raft_node_id\":false,\"enable_ui\":false,\"listeners\":[{\"config\":{\"address\":\"127.0.0.2:8200\",\"cluster_address\":\"127.0.0.2:8201\",\"telemetry\":[{\"unauthenticated_metrics_access\":true}],\"tls_disable\":true},\"type\":\"tcp\"}],\"log_format\":\"unspecified\",\"log_level\":\"trace\",\"log_requests_level\":\"\",\"max_lease_ttl\":0,\"pid_file\":\"\",\"plugin_directory\":\"\",\"plugin_file_permissions\":0,\"plugin_file_uid\":0,\"raw_storage_endpoint\":true,\"seals\":[{\"disabled\":false,\"type\":\"shamir\"}],\"storage\":{\"cluster_addr\":\"http://127.0.0.2:8201\",\"disable_clustering\":false,\"redirect_addr\":\"http://127.0.0.2:8200\",\"type\":\"raft\"},\"telemetry\":{\"add_lease_metrics_namespace_labels\":false,\"circonus_api_app\":\"\",\"circonus_api_token\":\"\",\"circonus_api_url\":\"\",\"circonus_broker_id\":\"\",\"circonus_broker_select_tag\":\"\",\"circonus_check_display_name\":\"\",\"circonus_check_force_metric_activation\":\"\",\"circonus_check_id\":\"\",\"circonus_check_instance_id\":\"\",\"circonus_check_search_tag\":\"\",\"circonus_check_tags\":\"\",\"circonus_submission_interval\":\"\",\"circonus_submission_url\":\"\",\"disable_hostname\":true,\"dogstatsd_addr\":\"\",\"dogstatsd_tags\":null,\"lease_metrics_epsilon\":3600000000000,\"maximum_gauge_cardinality\":500,\"metrics_prefix\":\"\",\"num_lease_metrics_buckets\":168,\"prometheus_retention_time\":30000000000,\"stackdriver_debug_logs\":false,\"stackdriver_location\":\"\",\"stackdriver_namespace\":\"\",\"stackdriver_project_id\":\"\",\"statsd_address\":\"\",\"statsite_address\":\"\",\"usage_gauge_period\":600000000000}}"
2022-06-16T15:42:44.484-0700 [DEBUG] storage.cache: creating LRU cache: size=10
2022-06-16T15:42:44.485-0700 [INFO]  core: Initializing version history cache for core
2022-06-16T15:42:44.485-0700 [DEBUG] cluster listener addresses synthesized: cluster_addresses=[127.0.0.2:8201]
2022-06-16T15:42:44.486-0700 [DEBUG] would have sent systemd notification (systemd not present): notification=READY=1
2022-06-16T15:42:49.432-0700 [INFO]  core: security barrier not initialized
2022-06-16T15:42:49.432-0700 [INFO]  core: security barrier not initialized
2022-06-16T15:42:49.433-0700 [INFO]  core: attempting to join possible raft leader node: leader_addr=http://127.0.0.1:8200
2022-06-16T15:42:49.513-0700 [DEBUG] core: unseal key supplied: migrate=false
2022-06-16T15:42:49.513-0700 [INFO]  core: security barrier not initialized
2022-06-16T15:42:49.591-0700 [DEBUG] core: starting cluster listeners
2022-06-16T15:42:49.591-0700 [INFO]  core.cluster-listener.tcp: starting listener: listener_address=127.0.0.2:8201
2022-06-16T15:42:49.592-0700 [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=127.0.0.2:8201
2022-06-16T15:42:49.592-0700 [TRACE] storage.raft: setting up raft cluster
2022-06-16T15:42:49.592-0700 [TRACE] storage.raft: applying raft config: inputs="map[node_id:vault_2 path:/Users/josh/tmp/vault-cluster/node-2]"
2022-06-16T15:42:49.592-0700 [TRACE] storage.raft: using larger timeouts for raft at startup: initial_election_timeout=15s initial_heartbeat_timeout=15s normal_election_timeout=5s normal_heartbeat_timeout=5s
2022-06-16T15:42:49.666-0700 [INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"vault_2\", NotifyCh:(chan<- bool)(0xc0000e0d20), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0xc000a0e120), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2022-06-16T15:42:49.704-0700 [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_1 Address:127.0.0.1:8201} {Suffrage:Nonvoter ID:vault_2 Address:127.0.0.2:8201}]"
2022-06-16T15:42:49.704-0700 [INFO]  storage.raft: entering follower state: follower="Node at 127.0.0.2:8201 [Follower]" leader-address= leader-id=
2022-06-16T15:42:49.705-0700 [TRACE] storage.raft: finished setting up raft cluster
2022-06-16T15:42:49.705-0700 [INFO]  core: security barrier not initialized
2022-06-16T15:42:50.368-0700 [DEBUG] core.cluster-listener: performing server cert lookup
2022-06-16T15:42:50.396-0700 [DEBUG] storage.raft.raft-net: accepted connection: local-address=127.0.0.2:8201 remote-address=127.0.0.2:58292
2022-06-16T15:42:50.677-0700 [WARN]  storage.raft: failed to get previous log: previous-index=42 last-index=1 error="log not found"
2022-06-16T15:42:51.007-0700 [DEBUG] core.cluster-listener: performing server cert lookup
2022-06-16T15:42:51.034-0700 [DEBUG] storage.raft.raft-net: accepted connection: local-address=127.0.0.2:8201 remote-address=127.0.0.2:58293
2022-06-16T15:42:51.708-0700 [WARN]  core: cluster listener is already started
2022-06-16T15:42:51.708-0700 [TRACE] storage.raft: setting up raft cluster
2022-06-16T15:42:51.708-0700 [DEBUG] storage.raft: raft already started, not setting up cluster
2022-06-16T15:42:51.708-0700 [INFO]  core: vault is unsealed
2022-06-16T15:42:51.708-0700 [INFO]  core: entering standby mode
2022-06-16T15:42:54.210-0700 [TRACE] core: found new active node information, refreshing
2022-06-16T15:42:54.210-0700 [DEBUG] core: parsing information for new active node: active_cluster_addr=https://127.0.0.1:8201 active_redirect_addr=http://127.0.0.1:8200
2022-06-16T15:42:54.211-0700 [DEBUG] core: refreshing forwarding connection: clusterAddr=https://127.0.0.1:8201
2022-06-16T15:42:54.211-0700 [DEBUG] core: clearing forwarding clients
2022-06-16T15:42:54.211-0700 [DEBUG] core: done clearing forwarding clients
2022-06-16T15:42:54.215-0700 [DEBUG] core: done refreshing forwarding connection: clusterAddr=https://127.0.0.1:8201
2022-06-16T15:42:54.216-0700 [DEBUG] core.cluster-listener: creating rpc dialer: address=127.0.0.1:8201 alpn=req_fw_sb-act_v1 host=fw-2700202d-b09e-364f-dde8-49d87a2c9d64
2022-06-16T15:42:54.230-0700 [DEBUG] core.cluster-listener: performing client cert lookup
2022-06-16T15:43:14.631-0700 [TRACE] storage.raft: triggering raft config reload due to initial timeout
2022-06-16T15:43:14.631-0700 [TRACE] storage.raft: reloaded raft config to set lower timeouts: config="raft.ReloadableConfig{TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000}"

For comparison, here are the logs from the same 2 node test cluster, this time from the active node (raft leader):

==> Vault server configuration:

             Api Address: http://127.0.0.1:8200
                     Cgo: disabled
         Cluster Address: https://127.0.0.1:8201
              Go Version: go1.18.3
              Listener 1: tcp (addr: "127.0.0.1:8200", cluster address: "127.0.0.1:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "disabled")
               Log Level: trace
                   Mlock: supported: false, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.12.0-dev1, built 2022-06-15T14:50:44Z
             Version Sha: 4762806f166adcc8c9baf50702c336db90987c56

==> Vault server started! Log data will stream in below:

2022-06-16T15:42:00.750-0700 [INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
2022-06-16T15:42:00.772-0700 [DEBUG] storage.raft.fsm: time to open database: elapsed=19.751088ms path=/Users/josh/tmp/vault-cluster/node-1/vault.db
2022-06-16T15:42:00.916-0700 [DEBUG] core: set config: sanitized config="{\"api_addr\":\"http://127.0.0.1:8200\",\"cache_size\":10,\"cluster_addr\":\"http://127.0.0.1:8201\",\"cluster_cipher_suites\":\"\",\"cluster_name\":\"\",\"default_lease_ttl\":0,\"default_max_request_duration\":0,\"disable_cache\":false,\"disable_clustering\":false,\"disable_indexing\":false,\"disable_mlock\":true,\"disable_performance_standby\":false,\"disable_printable_check\":false,\"disable_sealwrap\":false,\"disable_sentinel_trace\":false,\"enable_response_header_hostname\":false,\"enable_response_header_raft_node_id\":false,\"enable_ui\":false,\"listeners\":[{\"config\":{\"address\":\"127.0.0.1:8200\",\"cluster_address\":\"127.0.0.1:8201\",\"telemetry\":[{\"unauthenticated_metrics_access\":true}],\"tls_disable\":true},\"type\":\"tcp\"}],\"log_format\":\"unspecified\",\"log_level\":\"trace\",\"log_requests_level\":\"\",\"max_lease_ttl\":0,\"pid_file\":\"\",\"plugin_directory\":\"\",\"plugin_file_permissions\":0,\"plugin_file_uid\":0,\"raw_storage_endpoint\":true,\"seals\":[{\"disabled\":false,\"type\":\"shamir\"}],\"storage\":{\"cluster_addr\":\"http://127.0.0.1:8201\",\"disable_clustering\":false,\"redirect_addr\":\"http://127.0.0.1:8200\",\"type\":\"raft\"},\"telemetry\":{\"add_lease_metrics_namespace_labels\":false,\"circonus_api_app\":\"\",\"circonus_api_token\":\"\",\"circonus_api_url\":\"\",\"circonus_broker_id\":\"\",\"circonus_broker_select_tag\":\"\",\"circonus_check_display_name\":\"\",\"circonus_check_force_metric_activation\":\"\",\"circonus_check_id\":\"\",\"circonus_check_instance_id\":\"\",\"circonus_check_search_tag\":\"\",\"circonus_check_tags\":\"\",\"circonus_submission_interval\":\"\",\"circonus_submission_url\":\"\",\"disable_hostname\":true,\"dogstatsd_addr\":\"\",\"dogstatsd_tags\":null,\"lease_metrics_epsilon\":3600000000000,\"maximum_gauge_cardinality\":500,\"metrics_prefix\":\"\",\"num_lease_metrics_buckets\":168,\"prometheus_retention_time\":30000000000,\"stackdriver_debug_logs\":false,\"stackdriver_location\":\"\",\"stackdriver_namespace\":\"\",\"stackdriver_project_id\":\"\",\"statsd_address\":\"\",\"statsite_address\":\"\",\"usage_gauge_period\":600000000000}}"
2022-06-16T15:42:00.917-0700 [DEBUG] storage.cache: creating LRU cache: size=10
2022-06-16T15:42:00.920-0700 [INFO]  core: Initializing version history cache for core
2022-06-16T15:42:00.920-0700 [DEBUG] cluster listener addresses synthesized: cluster_addresses=[127.0.0.1:8201]
2022-06-16T15:42:00.922-0700 [DEBUG] would have sent systemd notification (systemd not present): notification=READY=1
2022-06-16T15:42:05.473-0700 [INFO]  core: security barrier not initialized
2022-06-16T15:42:05.473-0700 [DEBUG] core: bootstrapping raft backend
2022-06-16T15:42:05.473-0700 [TRACE] storage.raft: setting up raft cluster
2022-06-16T15:42:05.473-0700 [TRACE] storage.raft: applying raft config: inputs="map[node_id:vault_1 path:/Users/josh/tmp/vault-cluster/node-1]"
2022-06-16T15:42:05.553-0700 [INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"vault_1\", NotifyCh:(chan<- bool)(0xc0000caa10), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0xc0008b2090), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2022-06-16T15:42:05.594-0700 [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_1 Address:127.0.0.1:8201}]"
2022-06-16T15:42:05.594-0700 [INFO]  storage.raft: entering follower state: follower="Node at vault_1 [Follower]" leader-address= leader-id=
2022-06-16T15:42:11.198-0700 [WARN]  storage.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2022-06-16T15:42:11.198-0700 [INFO]  storage.raft: entering candidate state: node="Node at vault_1 [Candidate]" term=2
2022-06-16T15:42:11.317-0700 [DEBUG] storage.raft: votes: needed=1
2022-06-16T15:42:11.317-0700 [DEBUG] storage.raft: vote granted: from=vault_1 term=2 tally=1
2022-06-16T15:42:11.317-0700 [INFO]  storage.raft: election won: tally=1
2022-06-16T15:42:11.317-0700 [INFO]  storage.raft: entering leader state: leader="Node at vault_1 [Leader]"
2022-06-16T15:42:11.392-0700 [TRACE] storage.raft: finished setting up raft cluster
2022-06-16T15:42:11.392-0700 [DEBUG] core: finished bootstrapping raft backend
2022-06-16T15:42:11.658-0700 [INFO]  core: security barrier initialized: stored=1 shares=1 threshold=1
2022-06-16T15:42:11.814-0700 [DEBUG] core: cluster name not found/set, generating new
2022-06-16T15:42:11.814-0700 [DEBUG] core: cluster name set: name=vault-cluster-fc343858
2022-06-16T15:42:11.814-0700 [DEBUG] core: cluster ID not found, generating new
2022-06-16T15:42:11.814-0700 [DEBUG] core: cluster ID set: id=64942627-ac90-e296-0fe6-2f9af14c661f
2022-06-16T15:42:11.814-0700 [DEBUG] core: generating cluster private key
2022-06-16T15:42:11.820-0700 [DEBUG] core: generating local cluster certificate: host=fw-db548899-a22e-8d62-3536-18cc4f531250
2022-06-16T15:42:11.911-0700 [INFO]  core: post-unseal setup starting
2022-06-16T15:42:11.911-0700 [DEBUG] core: clearing forwarding clients
2022-06-16T15:42:11.911-0700 [DEBUG] core: done clearing forwarding clients
2022-06-16T15:42:11.911-0700 [DEBUG] core: persisting feature flags
2022-06-16T15:42:12.072-0700 [INFO]  core: loaded wrapping token key
2022-06-16T15:42:12.148-0700 [INFO]  core: Recorded vault version: vault version=1.12.0 upgrade time="2022-06-16 22:42:12.072423 +0000 UTC" build date=2022-06-15T14:50:44Z
2022-06-16T15:42:12.148-0700 [INFO]  core: successfully setup plugin catalog: plugin-directory=""
2022-06-16T15:42:12.148-0700 [INFO]  core: no mounts; adding default mount table
2022-06-16T15:42:12.310-0700 [INFO]  core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2022-06-16T15:42:12.313-0700 [INFO]  core: successfully mounted backend: type=system path=sys/
2022-06-16T15:42:12.316-0700 [INFO]  core: successfully mounted backend: type=identity path=identity/
2022-06-16T15:42:12.719-0700 [TRACE] token: no token generation counter found in storage
2022-06-16T15:42:12.719-0700 [INFO]  core: successfully enabled credential backend: type=token path=token/ namespace="ID: root. Path: "
2022-06-16T15:42:12.798-0700 [INFO]  rollback: starting rollback manager
2022-06-16T15:42:12.798-0700 [TRACE] expiration.job-manager: created dispatcher: name=expire-dispatcher num_workers=200
2022-06-16T15:42:12.799-0700 [TRACE] expiration.job-manager: initialized dispatcher: num_workers=200
2022-06-16T15:42:12.799-0700 [TRACE] expiration.job-manager: created job manager: name=expire pool_size=200
2022-06-16T15:42:12.799-0700 [TRACE] expiration.job-manager: starting job manager: name=expire
2022-06-16T15:42:12.799-0700 [TRACE] expiration.job-manager: starting dispatcher
2022-06-16T15:42:12.800-0700 [INFO]  core: restoring leases
2022-06-16T15:42:12.800-0700 [DEBUG] expiration: collecting leases
2022-06-16T15:42:12.802-0700 [DEBUG] expiration: leases collected: num_existing=0
2022-06-16T15:42:12.820-0700 [INFO]  expiration: lease restore complete
2022-06-16T15:42:12.952-0700 [DEBUG] identity: loading entities
2022-06-16T15:42:12.952-0700 [DEBUG] identity: entities collected: num_existing=0
2022-06-16T15:42:12.953-0700 [INFO]  identity: entities restored
2022-06-16T15:42:12.953-0700 [DEBUG] identity: identity loading groups
2022-06-16T15:42:12.953-0700 [DEBUG] identity: groups collected: num_existing=0
2022-06-16T15:42:12.953-0700 [INFO]  identity: groups restored
2022-06-16T15:42:12.953-0700 [DEBUG] identity: identity loading OIDC clients
2022-06-16T15:42:12.953-0700 [TRACE] mfa: loading login MFA configurations
2022-06-16T15:42:12.953-0700 [TRACE] mfa: methods collected: num_existing=0
2022-06-16T15:42:12.953-0700 [TRACE] mfa: configurations restored: namespace="" prefix=login-mfa/method/
2022-06-16T15:42:12.953-0700 [TRACE] mfa: loading login MFA enforcement configurations
2022-06-16T15:42:12.953-0700 [TRACE] mfa: enforcements configs collected: num_existing=0
2022-06-16T15:42:12.953-0700 [TRACE] mfa: enforcement configurations restored: namespace="" prefix=login-mfa/enforcement/
2022-06-16T15:42:12.954-0700 [TRACE] activity: scanned existing logs: out=[]
2022-06-16T15:42:12.954-0700 [TRACE] activity: scanned existing logs: out=[]
2022-06-16T15:42:12.954-0700 [TRACE] activity: no intent log found
2022-06-16T15:42:13.032-0700 [DEBUG] secrets.identity.identity_79a2b842: wrote OIDC default provider
2022-06-16T15:42:13.226-0700 [DEBUG] secrets.identity.identity_79a2b842: generated OIDC public key to sign JWTs: key_id=40f1e25b-c3e7-8327-6ee5-3d19655ffaf4
2022-06-16T15:42:13.499-0700 [DEBUG] secrets.identity.identity_79a2b842: generated OIDC public key for future use: key_id=884e35f7-5dd0-a694-a508-6e8fd35b804c
2022-06-16T15:42:13.578-0700 [DEBUG] secrets.identity.identity_79a2b842: wrote OIDC default key
2022-06-16T15:42:13.655-0700 [DEBUG] secrets.identity.identity_79a2b842: wrote OIDC allow_all assignment
2022-06-16T15:42:13.732-0700 [INFO]  core: post-unseal setup complete
2022-06-16T15:42:13.963-0700 [DEBUG] token: no wal state found when generating token
2022-06-16T15:42:13.963-0700 [INFO]  core: root token generated
2022-06-16T15:42:14.062-0700 [INFO]  core: pre-seal teardown starting
2022-06-16T15:42:14.062-0700 [INFO]  core: stopping raft active node
2022-06-16T15:42:14.062-0700 [DEBUG] expiration: stop triggered
2022-06-16T15:42:14.062-0700 [TRACE] expiration.job-manager: terminating job manager...
2022-06-16T15:42:14.062-0700 [TRACE] expiration.job-manager: terminating dispatcher
2022-06-16T15:42:14.062-0700 [DEBUG] expiration: finished stopping
2022-06-16T15:42:14.062-0700 [INFO]  rollback: stopping rollback manager
2022-06-16T15:42:14.063-0700 [INFO]  core: pre-seal teardown complete
2022-06-16T15:42:15.218-0700 [DEBUG] core: unseal key supplied: migrate=false
2022-06-16T15:42:15.219-0700 [DEBUG] core: starting cluster listeners
2022-06-16T15:42:15.220-0700 [INFO]  core.cluster-listener.tcp: starting listener: listener_address=127.0.0.1:8201
2022-06-16T15:42:15.220-0700 [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=127.0.0.1:8201
2022-06-16T15:42:15.220-0700 [TRACE] storage.raft: setting up raft cluster
2022-06-16T15:42:15.220-0700 [TRACE] storage.raft: applying raft config: inputs="map[node_id:vault_1 path:/Users/josh/tmp/vault-cluster/node-1]"
2022-06-16T15:42:15.220-0700 [TRACE] storage.raft: using larger timeouts for raft at startup: initial_election_timeout=15s initial_heartbeat_timeout=15s normal_election_timeout=5s normal_heartbeat_timeout=5s
2022-06-16T15:42:15.220-0700 [INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"vault_1\", NotifyCh:(chan<- bool)(0xc0001f2700), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0xc0008b2090), NoSnapshotRestoreOnStart:true, skipStartup:false}"
2022-06-16T15:42:15.258-0700 [INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:vault_1 Address:127.0.0.1:8201}]"
2022-06-16T15:42:15.258-0700 [TRACE] storage.raft: reloaded raft config to set lower timeouts: config="raft.ReloadableConfig{TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000}"
2022-06-16T15:42:15.258-0700 [TRACE] storage.raft: finished setting up raft cluster
2022-06-16T15:42:15.258-0700 [INFO]  storage.raft: entering follower state: follower="Node at 127.0.0.1:8201 [Follower]" leader-address= leader-id=
2022-06-16T15:42:15.258-0700 [INFO]  core: vault is unsealed
2022-06-16T15:42:15.258-0700 [WARN]  storage.raft: heartbeat timeout reached, starting election: last-leader-addr= last-leader-id=
2022-06-16T15:42:15.258-0700 [INFO]  storage.raft: entering candidate state: node="Node at 127.0.0.1:8201 [Candidate]" term=3
2022-06-16T15:42:15.258-0700 [INFO]  core: entering standby mode
2022-06-16T15:42:15.372-0700 [DEBUG] storage.raft: votes: needed=1
2022-06-16T15:42:15.373-0700 [DEBUG] storage.raft: vote granted: from=vault_1 term=3 tally=1
2022-06-16T15:42:15.373-0700 [INFO]  storage.raft: election won: tally=1
2022-06-16T15:42:15.373-0700 [INFO]  storage.raft: entering leader state: leader="Node at 127.0.0.1:8201 [Leader]"
2022-06-16T15:42:15.488-0700 [INFO]  core: acquired lock, enabling active operation
2022-06-16T15:42:15.488-0700 [DEBUG] core: generating cluster private key
2022-06-16T15:42:15.492-0700 [DEBUG] core: generating local cluster certificate: host=fw-2700202d-b09e-364f-dde8-49d87a2c9d64
2022-06-16T15:42:15.580-0700 [INFO]  core: post-unseal setup starting
2022-06-16T15:42:15.580-0700 [DEBUG] core: clearing forwarding clients
2022-06-16T15:42:15.580-0700 [DEBUG] core: done clearing forwarding clients
2022-06-16T15:42:15.580-0700 [DEBUG] core: persisting feature flags
2022-06-16T15:42:15.653-0700 [INFO]  core: loaded wrapping token key
2022-06-16T15:42:15.654-0700 [INFO]  core: successfully setup plugin catalog: plugin-directory=""
2022-06-16T15:42:15.654-0700 [INFO]  core: successfully mounted backend: type=system path=sys/
2022-06-16T15:42:15.654-0700 [INFO]  core: successfully mounted backend: type=identity path=identity/
2022-06-16T15:42:15.655-0700 [INFO]  core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2022-06-16T15:42:15.656-0700 [TRACE] token: no token generation counter found in storage
2022-06-16T15:42:15.656-0700 [INFO]  core: successfully enabled credential backend: type=token path=token/ namespace="ID: root. Path: "
2022-06-16T15:42:15.656-0700 [TRACE] expiration.job-manager: created dispatcher: name=expire-dispatcher num_workers=200
2022-06-16T15:42:15.656-0700 [INFO]  rollback: starting rollback manager
2022-06-16T15:42:15.656-0700 [TRACE] expiration.job-manager: initialized dispatcher: num_workers=200
2022-06-16T15:42:15.656-0700 [TRACE] expiration.job-manager: created job manager: name=expire pool_size=200
2022-06-16T15:42:15.656-0700 [TRACE] expiration.job-manager: starting job manager: name=expire
2022-06-16T15:42:15.656-0700 [TRACE] expiration.job-manager: starting dispatcher
2022-06-16T15:42:15.656-0700 [INFO]  core: restoring leases
2022-06-16T15:42:15.657-0700 [DEBUG] expiration: collecting leases
2022-06-16T15:42:15.657-0700 [DEBUG] expiration: leases collected: num_existing=0
2022-06-16T15:42:15.657-0700 [DEBUG] identity: loading entities
2022-06-16T15:42:15.657-0700 [DEBUG] identity: entities collected: num_existing=0
2022-06-16T15:42:15.657-0700 [INFO]  expiration: lease restore complete
2022-06-16T15:42:15.657-0700 [INFO]  identity: entities restored
2022-06-16T15:42:15.657-0700 [DEBUG] identity: identity loading groups
2022-06-16T15:42:15.657-0700 [DEBUG] identity: groups collected: num_existing=0
2022-06-16T15:42:15.657-0700 [INFO]  identity: groups restored
2022-06-16T15:42:15.657-0700 [DEBUG] identity: identity loading OIDC clients
2022-06-16T15:42:15.657-0700 [TRACE] mfa: loading login MFA configurations
2022-06-16T15:42:15.657-0700 [TRACE] mfa: methods collected: num_existing=0
2022-06-16T15:42:15.657-0700 [TRACE] mfa: configurations restored: namespace="" prefix=login-mfa/method/
2022-06-16T15:42:15.657-0700 [TRACE] mfa: loading login MFA enforcement configurations
2022-06-16T15:42:15.657-0700 [TRACE] mfa: enforcements configs collected: num_existing=0
2022-06-16T15:42:15.657-0700 [TRACE] mfa: enforcement configurations restored: namespace="" prefix=login-mfa/enforcement/
2022-06-16T15:42:15.657-0700 [TRACE] activity: scanned existing logs: out=[]
2022-06-16T15:42:15.657-0700 [INFO]  core: starting raft active node
2022-06-16T15:42:15.657-0700 [TRACE] activity: no intent log found
2022-06-16T15:42:15.657-0700 [TRACE] activity: scanned existing logs: out=[]
2022-06-16T15:42:15.657-0700 [INFO]  storage.raft: starting autopilot: config="&{false 0 10s 24h0m0s 1000 0 10s false redundancy_zone upgrade_version}" reconcile_interval=0s
2022-06-16T15:42:15.658-0700 [DEBUG] storage.raft.autopilot: autopilot is now running
2022-06-16T15:42:15.658-0700 [DEBUG] storage.raft.autopilot: state update routine is now running
2022-06-16T15:42:15.658-0700 [DEBUG] core: request forwarding setup function
2022-06-16T15:42:15.658-0700 [DEBUG] core: clearing forwarding clients
2022-06-16T15:42:15.658-0700 [DEBUG] core: done clearing forwarding clients
2022-06-16T15:42:15.659-0700 [DEBUG] core: leaving request forwarding setup function
2022-06-16T15:42:15.733-0700 [INFO]  core: post-unseal setup complete
2022-06-16T15:42:31.443-0700 [INFO]  core: successful mount: namespace="" path=secret/ type=kv
2022-06-16T15:42:31.526-0700 [INFO]  secrets.kv.kv_a688fde2: collecting keys to upgrade
2022-06-16T15:42:31.526-0700 [INFO]  secrets.kv.kv_a688fde2: done collecting keys: num_keys=1
2022-06-16T15:42:31.526-0700 [DEBUG] secrets.kv.kv_a688fde2: upgrading keys: progress=0/1
2022-06-16T15:42:31.526-0700 [INFO]  secrets.kv.kv_a688fde2: upgrading keys finished
2022-06-16T15:42:49.513-0700 [TRACE] storage.raft: adding server to raft via autopilot: id=vault_2
2022-06-16T15:42:49.514-0700 [INFO]  storage.raft: updating configuration: command=AddNonvoter server-id=vault_2 server-addr=127.0.0.2:8201 servers="[{Suffrage:Voter ID:vault_1 Address:127.0.0.1:8201} {Suffrage:Nonvoter ID:vault_2 Address:127.0.0.2:8201}]"
2022-06-16T15:42:49.553-0700 [INFO]  storage.raft: added peer, starting replication: peer=vault_2
2022-06-16T15:42:49.555-0700 [DEBUG] core.cluster-listener: creating rpc dialer: address=127.0.0.2:8201 alpn=raft_storage_v1 host=raft-9a0d85bf-5791-db5d-cee8-229e3c4319ae
2022-06-16T15:42:49.591-0700 [INFO]  system: follower node answered the raft bootstrap challenge: follower_server_id=vault_2
2022-06-16T15:42:50.364-0700 [DEBUG] core.cluster-listener: creating rpc dialer: address=127.0.0.2:8201 alpn=raft_storage_v1 host=raft-9a0d85bf-5791-db5d-cee8-229e3c4319ae
2022-06-16T15:42:50.383-0700 [DEBUG] core.cluster-listener: performing client cert lookup
2022-06-16T15:42:50.572-0700 [ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter vault_2 127.0.0.2:8201}" error="dial tcp 127.0.0.2:8201: connect: connection refused"
2022-06-16T15:42:50.677-0700 [WARN]  storage.raft: appendEntries rejected, sending older logs: peer="{Nonvoter vault_2 127.0.0.2:8201}" next=2
2022-06-16T15:42:50.715-0700 [INFO]  storage.raft: pipelining replication: peer="{Nonvoter vault_2 127.0.0.2:8201}"
2022-06-16T15:42:51.004-0700 [DEBUG] core.cluster-listener: creating rpc dialer: address=127.0.0.2:8201 alpn=raft_storage_v1 host=raft-9a0d85bf-5791-db5d-cee8-229e3c4319ae
2022-06-16T15:42:51.022-0700 [DEBUG] core.cluster-listener: performing client cert lookup
2022-06-16T15:42:54.217-0700 [DEBUG] core.cluster-listener: performing server cert lookup
2022-06-16T15:42:54.242-0700 [DEBUG] core.request-forward: got request forwarding connection
2022-06-16T15:43:05.658-0700 [INFO]  storage.raft.autopilot: Promoting server: id=vault_2 address=127.0.0.2:8201 name=vault_2
2022-06-16T15:43:05.658-0700 [INFO]  storage.raft: updating configuration: command=AddVoter server-id=vault_2 server-addr=127.0.0.2:8201 servers="[{Suffrage:Voter ID:vault_1 Address:127.0.0.1:8201} {Suffrage:Voter ID:vault_2 Address:127.0.0.2:8201}]"
0xArKrant commented 2 years ago

Hello @raskchanky, Thank for helping me ! If you are familiar with ansible, here are the 2 scripts i use to install vault & join a cluster. If you are not familiar, i'll rewrite it with the equivalent in shell. All 4 other nodes except the leader have been deployed with.

I'm quite new to ansible tho, so it might not be perfect, but still do the job (i think?)

COMMAND TO JOIN VAULT :

ansible-playbook \
        -u [REDACTED] \
        -i inventory/[REDACTED]-vault-2022.yml -i inventory[REDACTED]-vault-2022.secrets.yml \
        playbook-vault-2022.yml -e target=add-node \
                --limit "node_6.[REDACTED]" \
                -e vault_key_1=[REDACTED LEADER KEY 1] \
                -e vault_key_2=[REDACTED LEADER KEY 2] \
                -e vault_key_3=[REDACTED LEADER KEY 3]

INSTALL

- name: check if user vltadmin is created
  shell:
    cmd: "id -u vltadmin"

- name: create all required directories
  file:
    path: "{{ item }}"
    state: directory
    owner: vltadmin
    group: vltadmin
    mode: 0755
    recurse: yes
  loop:
    - '{{ vault_path }}/conf'
    - '{{ vault_path }}/ssl'
    - '{{ vault_path }}/bin'
    - '{{ raft_path }}/data'

- name: Download and unarchive vault selected version
  ansible.builtin.unarchive:
    src: [https://[REDACTED]/vault_]https://[REDACTED]/vault_{{ version }}_linux_amd64.zip
    dest: '{{ vault_path }}/bin'
    remote_src: yes
    mode: 0755
    owner: vltadmin
    group: vltadmin

- name: add vault binary path and vault_addr to bashrc
  lineinfile:
    dest: /root/.bashrc
    line: '{{ item }}'
  with_items:
    - 'export PATH=$PATH:{{ vault_path }}/bin'
    - 'export VAULT_ADDR=http://{{ inventory_hostname }}:8200/'

- name: add vault binary path to vltadmin bashrc
  lineinfile:
    dest: /home/vltadmin/.bashrc
    line: export PATH=$PATH:{{ vault_path }}/bin'

- name: get installed vault version
  shell:
    cmd: "vault --version"
  register: cmd_output

- set_fact:
    vault_version: "{{ cmd_output.stdout }}"

- name: generate systemd service file
  template:
    src: "vault.systemd.j2"
    dest: /etc/systemd/system/vault.service

- name: generate config file
  template:
    src: "vault.conf.j2"
    dest: '{{ vault_path }}/conf/server.hcl'
    owner: vltadmin
    group: vltadmin
    mode: 0755

- name: reload systemd configuration
  systemd:
    daemon_reload: true

- name: raise file limits for vltadmin
  lineinfile:
    dest: /etc/security/limits.conf
    line: '{{ item }}'
  with_items:
    - 'vltadmin   soft    nofile    1024768'
    - 'vltadmin   hard    nofile    1024768'

- name: end message
  debug:
    msg:
      - "Vault install successful: {{ vault_path }} / {{ raft_path }}"
      - "Vault version : {{ vault_version }}"

add-node :

- name: check if user vltadmin is created
  shell:
    cmd: "id -u vltadmin"

- name: Check that the vault binary exists
  stat:
    path: "{{ vault_path }}/bin/vault"
  register: stat_result

- name: get current vault version
  shell:
    cmd: "vault --version"
  register: cmd_output

- set_fact:
    vault_version: "{{ cmd_output.stdout }}"

- name: Start service vault, if exist
  ansible.builtin.systemd:
    name: vault
    state: started
  ignore_errors: yes

- set_fact:
    new_list: "{{ groups['all'] | reject('search', ansible_limit) | list }}"

- name: get current leader node
  uri:
    url: 'http://{{ item }}:8200/v1/sys/leader'
    method: GET
    body_format: json
    status_code: 200
    return_content: yes
  register: data
  with_random_choice: "{{ new_list }}"
  until: data.json.leader_address | length > 0
  retries: 10
  delay: 5

- set_fact:
    leader_addr: '{{ data.results[0].json.leader_address }}'

- name: Check current vault status
  shell:
    cmd: "vault status | grep Initialized | awk '{print $2}'"
  register: current_vault_status

- name: Join Cluster
  shell:
    cmd: "vault operator raft join {{ leader_addr }}"

- name: Check current vault status
  shell:
    cmd: "vault status | grep Initialized | awk '{print $2}'"
  register: joined_vault_status
  failed_when: joined_vault_status.stdout == "false"

- name: Unseal vault key 1
  shell: |
    vault operator unseal {{ vault_key_1 }}

- name: Wait for vault unseal 2
  pause:
    seconds: 5

- name: Unseal vault key 2
  shell: |
    vault operator unseal {{ vault_key_2 }}

- name: Wait for vault unseal 3
  pause:
    seconds: 5

- name: Unseal vault key 3
  shell: |
    vault operator unseal {{ vault_key_3 }}

- name: Check current vault seal status
  shell:
    cmd: "vault status | grep Sealed | awk '{print $2}'"
  register: vault_seal_status
  failed_when: vault_seal_status.stdout == "false"

- name: end message
  debug:
    msg: "Vault cluster joined successfully: {{ leader_addr }}"

Systemd:

### BEGIN INIT INFO
# Provides:          vault
# Required-Start:    $local_fs $remote_fs
# Required-Stop:     $local_fs $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Vault server
# Description:       Vault secret management tool
### END INIT INFO

[Unit]
Description=Vault secret management tool
Requires=network-online.target
After=network-online.target

[Service]
User=vltadmin
Group=vltadmin
PIDFile=/opt/vault/run/vault.pid
ExecStartPre=-/bin/mkdir -p /opt/vault/run
ExecStart=/opt/vault/bin/vault server -config=/opt/vault/conf/server.hcl -log-level=debug
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
KillSignal=SIGTERM
Restart=on-failure
RestartSec=42s
LimitMEMLOCK=infinity

[Install]
WantedBy=multi-user.target
raskchanky commented 2 years ago

@0xArKrant Thanks for the detailed configuration! I don't know much at all about ansible, but from what I can tell, it looks like your playbook is setup to do the right thing, namely joining the raft cluster and then unsealing. It's strange that only node 6 exhibits this problem.

You might have more luck getting an answer to this problem by posting on the forum. GitHub issues tend to focus on either bug reports or feature requests.

0xArKrant commented 2 years ago

@raskchanky Hello, it doesn't appear only in this node, it is, as I said earlier, quite random. Sometimes it work and sometimes it doesn't but I haven't been able to find the cause due to the poor infos I get from logs. I've set a cluster earlier without migrating data from old cluster and it seemed to work without issue.

I'll make more tests next days but, could it be related to migration?

0xArKrant commented 2 years ago

Hi @raskchanky / @cipherboy. I've got news for you. I'm discussing it here because it could potentially be a bug.

So i've been messing around with vault & consul past 4 days, trying out thing. (vault 1.10.3 + latest consul, snapshot and then restore -> No problem) (vault 1.10.3 + raft fresh cluster -> No Problem) (migration consul (using vault migrate) -> new vault cluster 1.10.3 with raft -> tls internal error)

I've come to a conclusion that the problem is located during the migration from our old consul to the new vault/raft cluster.

It is a migration from an old infrastructure to a new one, with new ip range, vlan & host names.

I've made a consul cluster with a single vault, made few secrets, and did the migration. Then created a vault cluster of this migration, and there is no error.

So my take on this is that during the migration (old to new) vault also export tls certificates used on the old cluster (old infrastructure), and then make a tls internal error on the new cluster(new infrastructure).

Maybe you can tell me more on this?

exnerd commented 2 years ago

Hi,

I can confirm that the problem persist on Vault v1.11.0.
I tried to migrated from a Vault v1.9.4 Consul v1.11.1, but I have the same tls issues as @0xArKrant. Sometimes it works, sometimes it doesn't. As there is no workaround for this issue at the moment, I don't see any working alternative that to stay on a Consul storage backend.

exnerd commented 2 years ago

We have found a workaround for this issue.
After migrating the data from consul to raft and unsealing the leader, you have to make a raft snapshot. After that reinit the vault and restore the raft snapshot, unseal the vault and add the followers. That procedure worked for us.

0xArKrant commented 2 years ago

Hi @exnerd, Thank you very much for the workaround, i'll try it on monday and give update here. Have a nice week-end :)

0xArKrant commented 2 years ago

Hey, sorry got busy. I can confirm that the workaround from @exnerd worked for me too, thanks. So we can agree this is a bug when migrating directly from consul using vault migrate.

eliottcassidy2000 commented 2 years ago

@raskchanky hey, wanted to say I have the same issue. Migrating consul to raft on vault 1.10.2. I've also spent some days on this and it seems the error stems from core: writing raft TLS keyring to storage in the broken follower node. If you look at @0xArKrant logs in the original post, you will see this line. Same thing as I'm getting. Don't know what causes it to decide randomly to try and write the keyring, but on the other follower nodes that do work it never attempts to write any keyring. Instead the cluster just unseals and then goes into standy mode as it should. Seems if we can figure out why it is trying to create a TLS keyring we can find the issue. (I'm asuming that during normal intended behavior, core: writing raft TLS keyring to storage would not appear in the logs of a vault follower node after it tries to unseal. If this assumption is wrong, please correct me) For now I'm trying the workaround

ncabatoff commented 2 years ago

This looks like it may be a dup of #12487, where the reporter discovered that it seemed to be a timing issue related to the size of the DB. The snapshot workaround probably helps because snapshotting also compacts the raft logs as a side-effect, and it's much faster to initialize a new peer after that.