hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
31.05k stars 4.2k forks source link

Core Cluster Listener - error handshaking cluster connection #21775

Open xskrasek opened 1 year ago

xskrasek commented 1 year ago

Describe the bug Hello, I am getting this in my debug log: [DEBUG] core.cluster-listener: error handshaking cluster connection: error="tls: no certificates configured" I have certificates in my tcp listener configuration and TLS works when accessing vault. Couldnt it be something similar to this? https://github.com/hashicorp/consul/issues/12286

I tried to google this error, but there is no mention of it with respect to vault.

Environment:

Vault server configuration file(s):


ui = true
log_level          = "Debug"
#Recommended for raft storage
disable_mlock = true
cluster_addr  = ""
api_addr      = ""

storage "raft" {
  path    = "/opt/vault/data"
  node_id = ""

  retry_join {
    leader_api_addr         = ""

  }

  retry_join {
    leader_api_addr         = ""

  }

  retry_join {
    leader_api_addr         = ""

  }

  retry_join {
    leader_api_addr         = ""

  }

  retry_join {
    leader_api_addr         = ""

  }
}

# HTTPS listener
listener "tcp" {
  address       = "0.0.0.0:8200"
 # cluster_address   = ""
  tls_cert_file = "/opt/vault/tls/tls.pem"
  tls_key_file = "/opt/vault/tls/key.pem"
  tls_disable_client_certs = "true"
  proxy_protocol_behavior = "use_always"
  proxy_protocol_authorized_addrs = "0.0.0.0/0"
}
maxb commented 1 year ago

The Vault cluster listener refers to the private communications on port 8201 which are only ever between one Vault server and another.

This listener does not use user-provided TLS certificates. Certificates are internally generated within the Vault code, and kept within the Vault storage backend.

For the cluster listener to somehow not have any certificates, should not really be possible.

Perhaps you could share more of your debug logs?

xskrasek commented 1 year ago

Unfortunately, I only get

[DEBUG] core.cluster-listener: performing server cert lookup
[DEBUG] core.cluster-listener: error handshaking cluster connection: error="tls: no certificates configured"

and sometimes this one

[WARN]  auth.kubernetes.auth_kubernetes_7138c20e: Configured CA PEM data contains no valid certificates, TLS verification will fail

but I don't think that these two are connected, I might be wrong there though.

pporee commented 1 year ago

Hello @xskrasek and @maxb got the same kind of issue when one of standby Vault node restart (cluster of 3 nodes) the standby node encounters some errors to rejoin the HA cluster.

Config:

            default_lease_ttl = "168h"
            max_lease_ttl = "87600h"
            ui = true

            listener "tcp" {
              address = "[::]:8200"
              cluster_address = "[::]:8201"
              tls_cert_file = "/vault/certs/tls.crt"
              tls_key_file = "/vault/certs/tls.key"
              tls_min_version = "tls12"
              telemetry {
                unauthenticated_metrics_access = true
              }
            }

            service_registration "kubernetes" {}

            seal "awskms" {}

            storage "s3" {}

            ha_storage "raft" {
              path = "/vault/ha"
            }

            telemetry {
              prometheus_retention_time = "30s"
              disable_hostname = true
            }

Master logs:

vault-external-0 vault {"@level":"info","@message":"aborting pipeline replication","@module":"ha.raft","@timestamp":"2023-10-13T12:26:53.199244Z","peer":{"Suffrage":1,"ID":"vault-external-2","Address":"vault-external-2.vault-external.domain.com:8201"}}
vault-external-0 vault {"@level":"error","@message":"failed to appendEntries to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:53.272341Z","error":"EOF","peer":{"Suffrage":1,"ID":"vault-external-2","Address":"vault-external-2.vault-external.domain.com:8201"}}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:53.332770Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to appendEntries to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:53.338851Z","error":"dial tcp 10.80.23.51:8201: connect: connection refused","peer":{"Suffrage":1,"ID":"vault-external-2","Address":"vault-external-2.vault-external.domain.com:8201"}}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:53.440375Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to appendEntries to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:53.447468Z","error":"dial tcp: lookup vault-external-2.vault-external.domain.com on 172.20.0.10:53: no such host","peer":{"Suffrage":1,"ID":"vault-external-2","Address":"vault-external-2.vault-external.domain.com:8201"}}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:53.449601Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to heartbeat to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:53.450559Z","backoff time":10000000,"error":"dial tcp 10.80.23.51:8201: connect: connection refused","peer":"vault-external-2.vault-external.domain.com:8201"}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:53.525850Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to appendEntries to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:53.526817Z","error":"dial tcp 10.80.23.51:8201: connect: connection refused","peer":{"Suffrage":1,"ID":"vault-external-2","Address":"vault-external-2.vault-external.domain.com:8201"}}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:53.667657Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to appendEntries to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:53.673956Z","error":"dial tcp: lookup vault-external-2.vault-external.domain.com on 172.20.0.10:53: no such host","peer":{"Suffrage":1,"ID":"vault-external-2","Address":"vault-external-2.vault-external.domain.com:8201"}}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:53.806905Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to appendEntries to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:53.814913Z","error":"dial tcp: lookup vault-external-2.vault-external.domain.com on 172.20.0.10:53: no such host","peer":{"Suffrage":1,"ID":"vault-external-2","Address":"vault-external-2.vault-external.domain.com:8201"}}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:54.053049Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to appendEntries to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:54.058136Z","error":"dial tcp: lookup vault-external-2.vault-external.domain.com on 172.20.0.10:53: no such host","peer":{"Suffrage":1,"ID":"vault-external-2","Address":"vault-external-2.vault-external.domain.com:8201"}}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:54.086473Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to heartbeat to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:54.090940Z","backoff time":10000000,"error":"dial tcp: lookup vault-external-2.vault-external.domain.com on 172.20.0.10:53: no such host","peer":"vault-external-2.vault-external.domain.com:8201"}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:54.431049Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to appendEntries to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:54.434130Z","error":"dial tcp 10.80.23.51:8201: connect: connection refused","peer":{"Suffrage":1,"ID":"vault-external-2","Address":"vault-external-2.vault-external.domain.com:8201"}}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:26:54.765258Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to heartbeat to","@module":"ha.raft","@timestamp":"2023-10-13T12:26:54.769291Z","backoff time":10000000,"error":"dial tcp: lookup vault-external-2.vault-external.domain.com on 172.20.0.10:53: no such host","peer":"vault-external-2.vault-external.domain.com:8201"}
...
vault-external-0 vault {"@level":"trace","@message":"adding server to raft via autopilot","@module":"ha.raft","@timestamp":"2023-10-13T12:28:02.450610Z","id":"vault-external-2"}
vault-external-0 vault {"@level":"info","@message":"follower node answered the raft bootstrap challenge","@module":"system","@timestamp":"2023-10-13T12:28:02.450651Z","follower_server_id":"vault-external-2"}
vault-external-0 vault {"@level":"trace","@message":"received empty Vault version in heartbeat state. faking it with the leader version for now","@module":"ha.raft","@timestamp":"2023-10-13T12:28:02.647742Z","id":"vault-external-2","leader version":"1.13.6"}
vault-external-0 vault {"@level":"debug","@message":"failed to contact","@module":"ha.raft","@timestamp":"2023-10-13T12:28:04.726786Z","server-id":"vault-external-2","time":71527552035}
vault-external-0 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:05.284906Z","address":"vault-external-2.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-11404878-ce58-28d7-c92d-7501ef573b6a"}
vault-external-0 vault {"@level":"error","@message":"failed to heartbeat to","@module":"ha.raft","@timestamp":"2023-10-13T12:28:05.289189Z","backoff time":2500000000,"error":"remote error: tls: unrecognized name","peer":"vault-external-2.vault-external.domain.com:8201"}

Standy node restarted:

vault-external-2 vault {"@level":"info","@message":"http: TLS handshake error from 127.0.0.1:33501: EOF","@timestamp":"2023-10-13T12:28:01.053434Z"}
vault-external-2 vault {"@level":"warn","@message":"no TLS config found for ALPN","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:01.975919Z","ALPN":["raft_storage_v1"]}
vault-external-2 vault {"@level":"debug","@message":"error handshaking cluster connection","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:01.975983Z","error":"unsupported protocol"}
vault-external-2 vault {"@level":"info","@message":"attempting to join possible raft leader node","@module":"core","@timestamp":"2023-10-13T12:28:02.298289Z","leader_addr":"https://vault-external.domain.com:8200"}
vault-external-2 vault {"@level":"warn","@message":"cluster listener is already started","@module":"core","@timestamp":"2023-10-13T12:28:02.453445Z"}
vault-external-2 vault {"@level":"trace","@message":"setting up raft cluster","@module":"ha.raft","@timestamp":"2023-10-13T12:28:02.453477Z"}
vault-external-2 vault {"@level":"trace","@message":"applying raft config","@module":"ha.raft","@timestamp":"2023-10-13T12:28:02.453485Z","inputs":{"path":"/vault/ha"}}
vault-external-2 vault {"@level":"trace","@message":"using larger timeouts for raft at startup","@module":"ha.raft","@timestamp":"2023-10-13T12:28:02.453496Z","initial_election_timeout":15000000000,"initial_heartbeat_timeout":15000000000,"normal_election_timeout":5000000000,"normal_heartbeat_timeout":5000000000}
vault-external-2 vault {"@level":"info","@message":"creating Raft","@module":"ha.raft","@timestamp":"2023-10-13T12:28:02.457561Z","config":"\u0026raft.Config{ProtocolVersion:3, HeartbeatTimeout:15000000000, ElectionTimeout:15000000000, CommitTimeout:50000000, MaxAppendEntries:64, BatchApplyCh:true, ShutdownOnRemove:true, TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, LeaderLeaseTimeout:2500000000, LocalID:\"vault-external-2\", NotifyCh:(chan\u003c- bool)(0x40011e6460), LogOutput:io.Writer(nil), LogLevel:\"DEBUG\", Logger:(*hclog.interceptLogger)(0x4001757260), NoSnapshotRestoreOnStart:true, skipStartup:false}"}
vault-external-2 vault {"@level":"info","@message":"initial configuration","@module":"ha.raft","@timestamp":"2023-10-13T12:28:02.459400Z","index":1,"servers":"[{Suffrage:Voter ID:vault-external-0 Address:vault-external-0.vault-external.domain.com:8201} {Suffrage:Voter ID:vault-external-2 Address:vault-external-2.vault-external.domain.com:8201} {Suffrage:Voter ID:vault-external-1 Address:vault-external-1.vault-external.domain.com:8201}]"}
vault-external-2 vault {"@level":"trace","@message":"finished setting up raft cluster","@module":"ha.raft","@timestamp":"2023-10-13T12:28:02.459456Z"}
vault-external-2 vault {"@level":"info","@message":"successfully joined the raft cluster","@module":"core","@timestamp":"2023-10-13T12:28:02.459475Z","leader_addr":"https://vault-external.domain.com:8200"}
vault-external-2 vault {"@level":"info","@message":"entering follower state","@module":"ha.raft","@timestamp":"2023-10-13T12:28:02.459466Z","follower":{},"leader-address":"","leader-id":""}
vault-external-2 vault {"@level":"debug","@message":"performing server cert lookup","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:05.289024Z"}
vault-external-2 vault {"@level":"debug","@message":"error handshaking cluster connection","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:05.289088Z","error":"tls: no certificates configured"}
vault-external-2 vault {"@level":"debug","@message":"performing server cert lookup","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:21.413145Z"}
vault-external-2 vault {"@level":"debug","@message":"error handshaking cluster connection","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:21.413208Z","error":"tls: no certificates configured"}
vault-external-2 vault {"@level":"debug","@message":"performing server cert lookup","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:24.658240Z"}
vault-external-2 vault {"@level":"debug","@message":"error handshaking cluster connection","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:24.658310Z","error":"tls: no certificates configured"}
vault-external-2 vault {"@level":"warn","@message":"heartbeat timeout reached, starting election","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.388956Z","last-leader-addr":"","last-leader-id":""}
vault-external-2 vault {"@level":"info","@message":"entering candidate state","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.389001Z","node":{},"term":2}
vault-external-2 vault {"@level":"debug","@message":"asking for vote","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.390436Z","address":"vault-external-0.vault-external.domain.com:8201","from":"vault-external-0","term":2}
vault-external-2 vault {"@level":"debug","@message":"voting for self","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.390479Z","id":"vault-external-2","term":2}
vault-external-2 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:27.390595Z","address":"vault-external-0.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-6f164edb-c478-538f-e50d-d228275ac3ee"}
vault-external-2 vault {"@level":"debug","@message":"asking for vote","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.394575Z","address":"vault-external-1.vault-external.domain.com:8201","from":"vault-external-1","term":2}
vault-external-2 vault {"@level":"debug","@message":"calculated votes needed","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.394632Z","needed":2,"term":2}
vault-external-2 vault {"@level":"debug","@message":"vote granted","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.394649Z","from":"vault-external-2","tally":1,"term":2}
vault-external-2 vault {"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:27.394692Z","address":"vault-external-1.vault-external.domain.com:8201","alpn":"raft_storage_v1","host":"raft-6f164edb-c478-538f-e50d-d228275ac3ee"}
vault-external-2 vault {"@level":"error","@message":"failed to make requestVote RPC","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.395290Z","error":"remote error: tls: unrecognized name","target":{"Suffrage":0,"ID":"vault-external-0","Address":"vault-external-0.vault-external.domain.com:8201"},"term":2}
vault-external-2 vault {"@level":"debug","@message":"performing client cert lookup","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:27.416553Z"}
vault-external-2 vault {"@level":"trace","@message":"triggering raft config reload due to being candidate or leader","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.417916Z"}
vault-external-2 vault {"@level":"trace","@message":"reloaded raft config to set lower timeouts","@module":"ha.raft","@timestamp":"2023-10-13T12:28:27.417993Z","config":"raft.ReloadableConfig{TrailingLogs:0x2800, SnapshotInterval:120000000000, SnapshotThreshold:0x2000, HeartbeatTimeout:5000000000, ElectionTimeout:5000000000}"}
vault-external-2 vault {"@level":"debug","@message":"performing server cert lookup","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:27.925355Z"}
vault-external-2 vault {"@level":"debug","@message":"error handshaking cluster connection","@module":"core.cluster-listener","@timestamp":"2023-10-13T12:28:27.925422Z","error":"tls: no certificates configured"}
raskchanky commented 1 year ago

@xskrasek I notice you have proxying configured in your Vault config. Have you tried removing that and seeing if the error persists?

chris-burn-phocas commented 12 months ago

Hello! I've also been experiencing the same issue. I'm running in AWS ECS and get the same error message when new tasks spin up and try to join.

[DEBUG] core.cluster-listener: error handshaking cluster connection: error="tls: no certificates configured"

I'm using integrated storage and I've confirmed, both through logs and vault operator raft list-peers, that the new nodes are able to join the raft cluster as non-voters (presumably because that step takes place over port 8200?).

I've also found, annoyingly, that the issue seems to be intermittent. New nodes couldn't unseal yesterday, but today new nodes were able to join the same cluster without a problem.

I've never used any proxying. Here's my config:

api_addr      = "https://{{ GetPrivateIP }}:8200"
cluster_addr  = "https://{{ GetPrivateIP }}:8201"
cluster_name  = "DEV"
disable_mlock = true
ui            = true

log_level = "Debug"

storage "raft" {
  path = "/vault/file"

  retry_join {
    leader_api_addr = "https://vault.local:8200"
  }
}

listener "tcp" {
  address         = "0.0.0.0:8200"
  cluster_address = "{{ GetPrivateIP }}:8201"
  tls_cert_file   = "/vault/tls/vault-cert.pem"
  tls_key_file    = "/vault/tls/vault-key.pem"
  tls_min_version = "tls13"
}

seal "awskms" {
  kms_key_id = "alias/vault-seal-dev"
}

telemetry {
  disable_hostname          = true
  prometheus_retention_time = "30s"
}
geldmon commented 12 months ago

I'm also experiencing the same issue in a local setup with 2 nodes.

Node1 (10.10.0.1) [ERROR] storage.raft: failed to make requestVote RPC: target="{Voter node2 10.10.0.2:8201}" error="remote error: tls: unrecognized name" term=13503

Node2 (10.10.0.2) [DEBUG] core.cluster-listener: error handshaking cluster connection: error="tls: no certificates configured"

api_addr      = "https://10.10.0.1:8200/"
cluster_addr  = "https://10.10.0.1:8201/"
disable_mlock = true
ui            = true

log_level = "trace"

storage "raft" {
  path = "/vault/file"

  retry_join {
    leader_api_addr = "https://10.10.0.2:8200/"
  }
}

listener "tcp" {
  address         = "10.10.0.1:8200"
  cluster_address = "10.10.0.1:8201"
  tls_disable_client_certs = "true"
  tls_cert_file   = "/vault/tls/vault-cert.pem"
  tls_key_file    = "/vault/tls/vault-key.pem"
}

telemetry {
  prometheus_retention_time = "24h"
  disable_hostname = true
}
raskchanky commented 12 months ago

Thank you all for the additional details. I'll investigate further.

pporee commented 11 months ago

hello @raskchanky,

FYI i can make the faulty node join the raft cluster by doing a vault operator step-down manually.

Regards, Pierre