hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.76k stars 4.16k forks source link

Bad interaction between Auto Unseal with Azure Key Vault + Integrated Storage #16158

Open erhlee-bird opened 2 years ago

erhlee-bird commented 2 years ago

Describe the bug A clear and concise description of what the bug is.

I am attempting to set up a highly available cluster configuration with 3 nodes using raft integrated storage and auto unseal with Azure Key Vault.

2 of the 3 nodes fail to auto-unseal and also fail to join the raft cluster.

I believe that the uninitialized nodes joining the cluster face a chicken and egg problem where they cannot join an existing cluster without unsealing but cannot unseal without being initialized.

To Reproduce Steps to reproduce the behavior:

  1. Run vault write ...
  2. Run vault login....
  3. See error

After configuring the nodes, I run the following operations:

  1. vault operator init on just one node in the cluster. The node successfully auto-unseals itself and self-elects as leader.
Key                      Value                                                                                                                           
---                      -----                                                                                                                           
Recovery Seal Type       shamir                                                                                                                          
Initialized              true                                                                                                                            
Sealed                   false                                                                                                                           
Total Recovery Shares    1                                                                                                                               
Threshold                1                                                                                                                               
Version                  1.10.3                                                                                                                          
Storage Type             raft                                                                                                                            
Cluster Name             vault-cluster-0a6b1c01                                                                                                          
Cluster ID               cb3dc44d-80f2-9ec5-e700-26a2bbeaa11d                                                                                            
HA Enabled               true                                                                                                                            
HA Cluster               n/a                                                                                                                             
HA Mode                  standby                                                                                                                         
Active Node Address      <none>                                                                                                                          
Raft Committed Index     40  

It eventually goes into standby after complaining about unstable configuration (presumably as the other nodes are unable to join).

[WARN]  storage.raft: not part of stable configuration, aborting election
  1. vault operator raft join from each of the other nodes. The command claims to report a success message but none of the nodes reflect a successful cluster join.
Key       Value
---       -----
Joined    true
Key                      Value
---                      -----
Recovery Seal Type       azurekeyvault
Initialized              false
Sealed                   true
Total Recovery Shares    0
Threshold                0
Unseal Progress          0/0
Unseal Nonce             n/a
Version                  1.10.3
Storage Type             raft
HA Enabled               true

The uninitialized nodes repeatedly print the following log messages:

[INFO]  core: security barrier not initialized
[INFO]  core: stored unseal keys supported, attempting fetch
[WARN]  failed to unseal core: error="stored unseal keys are supported, but none were found"

Expected behavior A clear and concise description of what you expected to happen.

I expect the 3 nodes to all successfully auto-unseal and form a raft cluster.

Environment:

Vault Docker image: vault:1.10.3

Vault server configuration file(s):

disable_mlock = true
log_level = trace
ui = true

listener "tcp" {
  address = "[::]:8200"
  cluster_address = "[::]:8201"
  tls_disable = true
}

seal "azurekeyvault" {}

api_addr = "http://...:8200"
cluster_addr = "http://...:8201"

storage "raft" {
  node_id = ...
  path = "/vault/file/"
}

Additional context Add any other context about the problem here.

heatherezell commented 2 years ago

Hi there! I think your other raft nodes may not have all of the information they need in order to join your cluster. For example, you need to have TLS certificates installed that tell the raft nodes which nodes are trusted to join. Please see this document and let me know if you have more questions: https://learn.hashicorp.com/tutorials/vault/raft-deployment-guide?in=vault/raft

erhlee-bird commented 2 years ago

Hi, I took all the TLS-related configuration out to try and remove at least one dimension of complexity.

Otherwise, the full config looks like this:

disable_mlock = true
log_level     = "trace"
ui            = true

listener "tcp" {
  address            = "[::]:8200"
  cluster_address    = "[::]:8201"
  tls_cert_file      = "/vault/config/tls.crt"
  tls_client_ca_file = "/vault/config/ca.pem"
  tls_disable        = false
  tls_key_file       = "/vault/config/tls.key"
}

seal "azurekeyvault" {}

then from a script, I'm running

    vault operator raft join \                                                                                                                           
      -address="https://[${answer}]:8200" \                
      -ca-path=/vault/config/ca.pem \                                                                                              
      -leader-ca-cert=@/vault/config/ca.pem \                                                                                                            
      -leader-client-cert=@/vault/config/tls.crt \                                                                                                       
      -leader-client-key=@/vault/config/tls.key

that's the point where I receive the response

Key       Value
---       -----
Joined    true

if I take away any of those flags TLS flags, the comms don't work at all so I at least know that the certs are working.