hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.06k stars 4.12k forks source link

Vault operator raft join always 'joined' during init & never fails #9457

Closed aphorise closed 2 years ago

aphorise commented 4 years ago

During initialisation when a node is not yet unsealed - a success joined & 0 exit code is unconditionally returned when performing vault operator raft join ....

The /sys/storage/raft/join API similarly returns a HTTP-200 with a joined json response body.

There's presently no way to determine when a node has been successfully added to raft peers list (without an additional list-peers after request and deductive comparisons)

To Reproduce

# // new vault process with no auto-unseal on host vault2 (192.168.178.252) of 3 nodes (192.168.178.253 & 192.168.178.251):
#vagrant@vault2:~$ \
vault operator init -key-shares=1 -key-threshold=1 -format=json ;
  # ...

vault status ;
  # Key                Value
  # ---                -----
  # Seal Type          shamir
  # Initialized        true
  # Sealed             true
  # Total Shares       1
  # Threshold          1
  # Unseal Progress    0/1
  # Unseal Nonce       n/a
  # Version            1.4.3
  # HA Enabled         true

vault operator raft join -format=json https://192.168.178.253:8200 ; echo $?
  # {
  #   "joined": true
  # }
  # 0

PAYLOAD='{
  "leader_api_addr": "https://192.168.178.253:8200",
  "leader_ca_cert": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----",
  "leader_client_cert": "-----BEGIN CERTIFICATE-----\n...\n-----END CERTIFICATE-----",
  "leader_client_key": "-----BEGIN RSA PRIVATE KEY-----\n...\n-----END RSA PRIVATE KEY-----"
}' ;
curl -v -X PUT -H "X-Vault-Request: true" -H "X-Vault-Token: $(vault print token)" -d "${PAYLOAD}" https://192.168.178.252:8200/v1/sys/storage/raft/join
  # < HTTP/2 200
  # < cache-control: no-store
  # < content-type: application/json
  # < content-length: 16
  # < date: Fri, 10 Jul 2020 21:46:30 GMT
  # <
  # {"joined":true}

Expected behavior Provide contextual response which express a HTTP-2xx / 0 exit code and joined message only when a node has actually been added / peered.

Environment:

Vault server configuration file(s):

cluster_name = "primary"
api_addr = "https://192.168.178.252:8200"
cluster_addr = "https://192.168.178.252:8201"

listener "tcp" {
  address       = "0.0.0.0:8200"
  cluster_address  = "192.168.178.252:8201"
  tls_cert_file = "/home/vagrant/vault2_certificate.crt"
  tls_key_file = "/home/vagrant/vault2_private.key"
#  tls_disable      = true
}

storage "raft" {
        path            = "/vault/data"
        node_id         = "vault2"
}

disable_mlock = true
ui = true
jay-dee7 commented 3 years ago

we recently started exploring vault integrated storage backend and faced this issue. vault operator raft join <vault-node-addr> returns joined: true even when the node doesn't join the network. I think this is a small but serious bug which should be handled on priority

ncabatoff commented 2 years ago

I was not able to reproduce this issue:

ncc$ ./vault operator raft join http://192.168.0.2:8200; echo $?
Error joining the node to the Raft cluster: Error making API request.

URL: POST http://127.0.0.3:8200/v1/sys/storage/raft/join
Code: 500. Errors:

* failed to join raft cluster: failed to join any raft leader node
2

In the logs:

2021-07-20T17:24:43.339-0400 [INFO]  core: attempting to join possible raft leader node: leader_addr=http://192.168.0.2:8200
2021-07-20T17:25:13.345-0400 [WARN]  core: join attempt failed: error="error during raft bootstrap init call: Put "http://192.168.0.2:8200/v1/sys/storage/raft/bootstrap/challenge": dial tcp 192.168.0.2:8200: i/o timeout"
2021-07-20T17:25:13.345-0400 [ERROR] core: failed to join raft cluster: error="failed to join any raft leader node"

The 192.168.0.2 node status:

ncc$ ./vault status
Key                Value
---                -----
Seal Type          shamir
Initialized        true
Sealed             true
...

I suspect we've improved this behaviour in recent versions. Please open a new issue if you observe this in current Vault versions.

danmur commented 2 years ago

EDIT: missed the "make a new issue" bit, sorry. LMK if it's worthwhile making a new issue.

Perhaps not the same problem, but it definitely should fail in some obvious cases:

[root@rom:~]# vault operator raft join http://10.1.1.92:8200
Key       Value
---       -----
Joined    true

[root@rom:~]# vault operator raft join http://10.1.1.92:8201
Key       Value
---       -----
Joined    true

[root@rom:~]# vault operator raft join http://10.1.1.92:82
Key       Value
---       -----
Joined    true

[root@rom:~]# vault operator raft join http://10
Key       Value
---       -----
Joined    true

[root@rom:~]# vault operator raft join http://10dddddd
Key       Value
---       -----
Joined    true

[root@rom:~]# vault --version
Vault v1.10.3 (v1.10.3) (cgo)