Describe the bug
we deploy vault and etcd in 3 machines, each machine has one vault and one etcd
when we bring down NIC on the vault leader machine, other two vault nodes enter failure state
# vault status
{"level":"debug","ts":"2020-09-10T17:55:56.054Z","caller":"balancer/balancer.go:60","msg":"registered balancer","policy":"picker-roundrobin-balanced","name":"etcd-picker-roundrobin-balanced"}
Error checking leader status: Error making API request.
URL: GET https://.../v1/sys/leader
Code: 500. Errors:
* context deadline exceeded
or
# vault status
{"level":"debug","ts":"2020-09-10T18:04:20.886Z","caller":"balancer/balancer.go:60","msg":"registered balancer","policy":"picker-roundrobin-balanced","name":"etcd-picker-roundrobin-balanced"}
Key Value
--- -----
Seal Type shamir
Initialized true
Sealed false
Total Shares 10
Threshold 5
Version 1.4.6
Cluster Name vault-cluster-8bc5a80a
Cluster ID 52c8416b-0170-f8e9-25cf-887730acb85b
HA Enabled true
HA Cluster n/a
HA Mode standby
Active Node Address <none>
if we bring down NIC on machine with standby vault, similar failure is observed
change etcd_api to v2 solves the issue, but we don't want to use v2.
if we first stop etcd on vault leader machine, then bring down NIC, everything works well. Logs below show the subconn-size is reduced from 4 to 3, the turned off etcd is kicked out from the pool temporarily
Describe the bug we deploy vault and etcd in 3 machines, each machine has one vault and one etcd when we bring down NIC on the vault leader machine, other two vault nodes enter failure state
or
logs from standby vault
etcd state
if we bring down NIC on machine with standby vault, similar failure is observed change etcd_api to v2 solves the issue, but we don't want to use v2.
if we first stop etcd on vault leader machine, then bring down NIC, everything works well. Logs below show the subconn-size is reduced from 4 to 3, the turned off etcd is kicked out from the pool temporarily
Expected behavior vault cluster still works when the machine hosting vault and etcd is unreachable
Environment:
vault status
): 1.4.6vault version
): 1.4.6Vault server configuration file(s):
Additional context Add any other context about the problem here.