Migration from file to raft fails for large indexes

gnugnug commented 3 years ago

Environment:

Vault Version: 1.8.1
Operating System/Architecture: Linux 64-bit

Background: We have a Vault instance using the file storage backend. We want to migrate it to raft integrated storage. Therefor we performed the following steps:

$ cat vaultmigrate.hcl
storage_source "file" {
  path = "/var/lib/vault-file/"
}
storage_destination "raft" {
  path = "/var/lib/vault/"
  node_id = "node01"
}
cluster_addr="https://node01:8201"

# === Migrate storage from file to raft ===

$ vault operator migrate -config=vaultmigrate.hcl
[INFO]  creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, Trai             lingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:"node01", NotifyCh:(chan<- bool)(0xc000d14000), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.intLogger)(0x             c00057c500), NoSnapshotRestoreOnStart:true, skipStartup:false}"
[INFO]  initial configuration: index=1 servers="[{Suffrage:Voter ID:node01 Address:node01:8201}]"
[INFO]  entering follower state: follower="Node at node01 [Follower]" leader=
[WARN]  heartbeat timeout reached, starting election: last-leader=
[INFO]  entering candidate state: node="Node at node01 [Candidate]" term=2
[INFO]  election won: tally=1
[INFO]  entering leader state: leader="Node at node01 [Leader]"
[INFO]  copied key: path=audit/771f0d12-216c-0155-01b1-b445e76360300/salt
[...]
[INFO]  copied key: path=sys/token/salt
Success! All of the keys have been migrated.

# === Vault server config ===

$ cat /etc/vault.hcl
storage "raft" {
  path = "/var/lib/vault"
  node_id = "node01"
}
api_addr = "https://node01:8200"
cluster_addr = "https://node01:8201"
listener "tcp" {
  address = "0.0.0.0:8200"
  cluster_address = "0.0.0.0:8201"
  tls_cert_file = "node01.crt"
  tls_key_file = "node01.key"
}
ui = true
disable_mlock = true
pid_file = "/var/run/vault/vault.pid"

# === Start Vault server with the new raft integrated storage and unseal it ===

$ vault server -config=/etc/vault.hcl
==> Vault server configuration:

             Api Address: https://node01:8200
                     Cgo: disabled
         Cluster Address: https://node01:8201
              Go Version: go1.16.6
              Listener 1: tcp (addr: "0.0.0.0:8200", cluster address: "0.0.0.0:8201", max_request_duration: "1m30s", max_request_size: "33554432", tls: "enabled")
               Log Level: info
                   Mlock: supported: true, enabled: false
           Recovery Mode: false
                 Storage: raft (HA available)
                 Version: Vault v1.8.1
             Version Sha: 4b0264f28defc05454c31277cfa6ff63695a458d

==> Vault server started! Log data will stream in below:

[INFO]  proxy environment: http_proxy="" https_proxy="" no_proxy=""
[INFO]  core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201
[INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=0.0.0.0:8201
[INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:"node01", NotifyCh:(chan<- bool)(0xc000dab650), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.interceptLogger)(0xc000b80930), NoSnapshotRestoreOnStart:true, skipStartup:false}"
[INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:node01 Address:node01:8201}]"
[INFO]  storage.raft: entering follower state: follower="Node at node01:8201 [Follower]" leader=
[WARN]  storage.raft: heartbeat timeout reached, starting election: last-leader=
[INFO]  storage.raft: entering candidate state: node="Node at node01:8201 [Candidate]" term=3
[INFO]  storage.raft: election won: tally=1
[INFO]  storage.raft: entering leader state: leader="Node at node01:8201 [Leader]"
[INFO]  core: writing raft TLS keyring to storage
[INFO]  core: vault is unsealed
[INFO]  core: entering standby mode
[INFO]  core: acquired lock, enabling active operation
[INFO]  core: post-unseal setup starting
[INFO]  core: loaded wrapping token key
[INFO]  core: successfully setup plugin catalog: plugin-directory=""
[INFO]  core: successfully mounted backend: type=system path=sys/
[INFO]  core: successfully mounted backend: type=identity path=identity/
[INFO]  core: successfully mounted backend: type=kv path=secret/
[INFO]  core: successfully mounted backend: type=pki path=pki/
[INFO]  core: successfully mounted backend: type=cubbyhole path=cubbyhole/
[INFO]  core: successfully enabled credential backend: type=token path=token/
[INFO]  core: successfully enabled credential backend: type=ldap path=ldap/
[INFO]  core: successfully enabled credential backend: type=approle path=approle/
[INFO]  rollback: starting rollback manager
[INFO]  core: restoring leases
[INFO]  expiration: lease restore complete
[INFO]  identity: entities restored
[INFO]  identity: groups restored
[INFO]  core: starting raft active node
[INFO]  storage.raft: starting autopilot: config="&{false 0 10s 24h0m0s 1000 0 10s}" reconcile_interval=0s
[INFO]  core: usage gauge collection is disabled
[INFO]  core: post-unseal setup complete

$ vault status
Key                     Value
---                     -----
Seal Type               shamir
Initialized             true
Sealed                  false
Version                 1.8.1
Storage Type            raft
HA Enabled              true
HA Cluster              https://node01:8201
HA Mode                 active
Raft Committed Index    24425
Raft Applied Index      24425

Now we switch to node02, start Vault there and join it to the cluster:

$ vault server -config=vault.hcl # Same config as above just node01 replaced by node02
$ vault operator raft join https://node01:8200
Key       Value
---       -----
Joined    true

Problem: As soon as we unseal node02 is will start communicating with node01 but run into the following error:

[INFO]  core.cluster-listener.tcp: starting listener: listener_address=0.0.0.0:8201
[INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=0.0.0.0:8201
[INFO]  storage.raft: creating Raft: config="&raft.Config{ProtocolVersion:3, HeartbeatTimeout:2000000000, ElectionTimeout:2000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:1000000000, LocalID:"node02", NotifyCh:(chan<- bool)(0xc0003dce70), LogOutput:io.Writer(nil), LogLevel:"DEBUG", Logger:(*hclog.interceptLogger)(0xc00028d7d0), NoSnapshotRestoreOnStart:true, skipStartup:false}"
[INFO]  storage.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:node01 Address:node01:8201} {Suffrage:Nonvoter ID:node02 Address:node02:8201}]"
[INFO]  storage.raft: entering follower state: follower="Node at node02:8201 [Follower]" leader=
[INFO]  core: security barrier not initialized
[WARN]  storage.raft: failed to get previous log: previous-index=24433 last-index=1 error="log not found"
[WARN]  core: cluster listener is already started
[INFO]  core: writing raft TLS keyring to storage
[ERROR] core: error writing raft TLS keyring: error="node is not the leader"
[INFO]  core: stopping raft server
[ERROR] storage.raft.raft-net: failed to accept connection: error="Raft RPC layer closed"
[ERROR] core: failed to unseal: error="node is not the leader"
[ERROR] storage.raft.raft-net: failed to decode incoming command: error="transport shutdown"
[WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]
[ERROR] storage.raft.raft-net: failed to decode incoming command: error="transport shutdown"
[WARN]  core.cluster-listener: no TLS config found for ALPN: ALPN=["raft_storage_v1"]

On node01 we see the following error message:

[INFO]  storage.raft: updating configuration: command=AddNonvoter server-id=node02 server-addr=node02:8201 servers="[{Suffrage:Voter ID:node01 Address:node01:8201} {Suffrage:Nonvoter ID:node02 Address:node02:8201}]"
[INFO]  storage.raft: added peer, starting replication: peer=node02
[INFO]  system: follower node answered the raft bootstrap challenge: follower_server_id=node02
[ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter node02 node02:8201}" error="dial tcp 10.0.0.2:8201: connect: connection refused"
[WARN]  storage.raft: appendEntries rejected, sending older logs: peer="{Nonvoter node02 node02:8201}" next=2
[ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter node02 node02:8201}" error=EOF
[INFO]  storage.raft: pipelining replication: peer="{Nonvoter node02 node02:8201}"
[INFO]  storage.raft: aborting pipeline replication: peer="{Nonvoter node02 node02:8201}"
[ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter node02 node02:8201}" error="remote error: tls: internal error"

What's interesting is that node02 is logging the line "core: writing raft TLS keyring to storage". The raft cluster already has a keyring created by node01. node02 shouldn't create a keyring as well?!

The behaviour is reproducible, the error messages stay exactly the same on every join. Even if we restart both nodes the cluster join never succeeds.

Workaround: However, if we delete about 20.000 secrets before joining node02 to the cluster, then the join works without problems. So it cannot be a permission or network issue, but looks more like a timing issue. Can you have a look into this?

isserrano commented 2 years ago

I am having the same issue :-( Version: Vault v1.8.2 and v1.9.0

isserrano commented 2 years ago

I have found a workaround, it is not ideal but is working. After doing the migration in the 1st node, I restart it, 2 times. I can see writing raft TLS keyring to storage just on the 1st time, maybe is related. Then I bring up each other node one by one, first time each of them fails with the same errors about TLS, so I delete everything in the raft data dir, remove the node from the cluster with remove-peer and restart the pod, once that is done, it joins the cluster without problems.

GMartinez-Sisti commented 2 years ago

I had this issue and I found a fix that I believe is safe. After migrating the data from consul, the vault node address is not correct, I was able to add nodes to the cluster but as soon as one node was restarted the cluster was lost and couldn't recover. This is probably caused by the previous configuration using a consul agent sidecar on 127.0.0.1 to reach the backend.

vault operator raft list-peers

Node          Address          State       Voter
----          -------          -----       -----
vault-0      127.0.0.1:8201    leader      true

To fix this I scaled the statefulset to 1 node, and then used the instructions from this article How to recover from permanently lost quorum while using Raft integrated storage with Vault. with the following file:

[
  {
    "id": "vault-0",
    "address": "vault-0.vault-internal:8201",
    "non_voter": false
  }
]

Restarting the pod is going to update the internal address to the correct one (confirmation is in the pod logs). Then I was able to add more nodes to the cluster and restart any of them (including vault-0) and the raft cluster always recovered.

Hope it helps, not sure if this issue is by design due to the restore, or something is not working as it should.

aphorise commented 2 years ago

Migration on a single node and with at least one restart before any join is attempted makes most sense - else there are too many changes being attempted in any given time.

@gnugnug - I'm curious if you've retested this flow in the most recent versions and if it's still applicable for you?

hashicorp / vault

Migration from file to raft fails for large indexes #12487