hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.87k stars 4.18k forks source link

[ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool #9175

Open jacekjaros opened 4 years ago

jacekjaros commented 4 years ago

Describe the bug Vault in random moments loose connection to Cassandra which is used as a secrets storage. When this are happen Vault is unable to recover.

Jun 09 08:25:30 cluster1-vault01 vault[14227]: 2020-06-09T08:25:30.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:25:40 cluster1-vault01 vault[14227]: 2020-06-09T08:25:40.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:25:50 cluster1-vault01 vault[14227]: 2020-06-09T08:25:50.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:26:00 cluster1-vault01 vault[14227]: 2020-06-09T08:26:00.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:26:10 cluster1-vault01 vault[14227]: 2020-06-09T08:26:10.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:26:20 cluster1-vault01 vault[14227]: 2020-06-09T08:26:20.868Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"
Jun 09 08:26:30 cluster1-vault01 vault[14227]: 2020-06-09T08:26:30.869Z [ERROR] core: key rotation periodic upgrade check failed: error="gocql: no hosts available in the pool"

To Reproduce Steps to reproduce the behavior:

  1. Run vault server
  2. Wait

Expected behavior Vault should recover (reconnect to Cassandra?)

Environment:

Vault server configuration file(s):

cluster_name = "dc1"
max_lease_ttl = "768h"
default_lease_ttl = "768h"
disable_clustering = "False"
cluster_addr = "https://cluster1-vault01.mydomain.com:8201"
api_addr = "https://cluster1-vault01.mydomain.com:8200"

plugin_directory = "/usr/local/lib/vault/plugins"

listener "tcp" {
  address = "192.168.1.2:8200"
  cluster_address = "192.168.1.2:8201"
  tls_cert_file = "/etc/vault/tls/global.mydomain.com.crt"
  tls_key_file = "/etc/vault/tls/global.mydoamin.com.key"
  tls_client_ca_file="/etc/vault/tls/MyDomain.crt"
  tls_min_version  = "tls12"
  tls_prefer_server_cipher_suites = "false"
  tls_disable = "false"
}

backend "cassandra" {
  hosts = "dev-cassandra.mydomain.com"
  consistency = "LOCAL_QUORUM"
  protocol_version = "4"
  username = "vault"
  password = "XXXXXX"
  tls = "1"
  pem_bundle_file = "/etc/vault/tls/gossip.pem"
  tls_skip_verify = "1"
  connection_timeout = "5"
}

ha_backend "consul" {
  address = "127.0.0.1:8500"
  path = "vault"
  service = "vault"
  scheme = "http"
}

ui = true

telemetry {
    prometheus_retention_time = "180s"
}

Additional context Cluster was build on top of 6 nodes. For now we have only one test vault agent which pull single secret so traffic is very low.

jacekjaros commented 4 years ago

Hi,

Good news - i was able to find root cause of my issue. Cassandra is passing to client (Vault) list of servers which contain private ip addresses which are not accessable form Vault cluster. I'm aware that this is Cassandra miss configuration however Vault don't allow me to use use walk around provided by gocql driver which is set DisableInitialHostLookup option to true.

Is there option to implement this parameter in Vault configuration?

Best regards, Jacek

kilocaleb commented 4 years ago

Hi,

Looks like this option is very helpful on a lot Vault + Cassandra deployments (especially in AWS). Created PR for that https://github.com/hashicorp/vault/pull/9733

-- kilocaleb