hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.87k stars 4.18k forks source link

HA failure when machine with one vault and one etcd is offline #9920

Open tiagolicanton opened 4 years ago

tiagolicanton commented 4 years ago

Describe the bug we deploy vault and etcd in 3 machines, each machine has one vault and one etcd when we bring down NIC on the vault leader machine, other two vault nodes enter failure state

# vault status
{"level":"debug","ts":"2020-09-10T17:55:56.054Z","caller":"balancer/balancer.go:60","msg":"registered balancer","policy":"picker-roundrobin-balanced","name":"etcd-picker-roundrobin-balanced"}
Error checking leader status: Error making API request.

URL: GET https://.../v1/sys/leader
Code: 500. Errors:

* context deadline exceeded

or

# vault status
{"level":"debug","ts":"2020-09-10T18:04:20.886Z","caller":"balancer/balancer.go:60","msg":"registered balancer","policy":"picker-roundrobin-balanced","name":"etcd-picker-roundrobin-balanced"}
Key                    Value
---                    -----
Seal Type              shamir
Initialized            true
Sealed                 false
Total Shares           10
Threshold              5
Version                1.4.6
Cluster Name           vault-cluster-8bc5a80a
Cluster ID             52c8416b-0170-f8e9-25cf-887730acb85b
HA Enabled             true
HA Cluster             n/a
HA Mode                standby
Active Node Address    <none>

logs from standby vault

Sep 10 23:40:55 node-1 vault[8722]: 2020-09-10T23:40:55.651Z [DEBUG] core: forwarding: error sending echo request to active node: error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Sep 10 23:40:56 node-1 vault[8722]: {"level":"warn","ts":"2020-09-10T23:40:56.162Z","caller":"picker/roundrobin_balanced.go:91","msg":"balancer failed","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","picker":"picker-roundrobin-balanced","address":"http://10.10.10.3:6000","success":false,"bytes-sent":true,"bytes-received":false}
Sep 10 23:40:56 node-1 vault[8722]: {"level":"warn","ts":"2020-09-10T23:40:56.162Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-2699c3d4-922a-4dbd-865a-d27ff41bdb7c/127.0.0.1:6002","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Sep 10 23:40:56 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:40:56.826Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://10.10.10.4:6000","subconn-index":1,"subconn-size":4}
Sep 10 23:40:56 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:40:56.827Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"http://10.10.10.4:6000","success":true,"bytes-sent":true,"bytes-received":true}
Sep 10 23:40:56 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:40:56.972Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://10.10.10.5:6000","subconn-index":2,"subconn-size":4}
Sep 10 23:40:56 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:40:56.973Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"http://10.10.10.5:6000","success":true,"bytes-sent":true,"bytes-received":true}
Sep 10 23:40:58 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:40:58.662Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://10.10.10.3:6000","subconn-index":3,"subconn-size":4}
Sep 10 23:41:00 node-1 vault[8722]: 2020-09-10T23:41:00.651Z [DEBUG] core: forwarding: error sending echo request to active node: error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Sep 10 23:41:01 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:01.142Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://localhost:6000","subconn-index":0,"subconn-size":4}
Sep 10 23:41:01 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:01.143Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"http://localhost:6000","success":true,"bytes-sent":true,"bytes-received":true}
Sep 10 23:41:01 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:01.154Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://10.10.10.4:6000","subconn-index":1,"subconn-size":4}
Sep 10 23:41:01 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:01.155Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"http://10.10.10.4:6000","success":true,"bytes-sent":true,"bytes-received":true}
Sep 10 23:41:01 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:01.155Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://10.10.10.5:6000","subconn-index":2,"subconn-size":4}
Sep 10 23:41:01 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:01.156Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"http://10.10.10.5:6000","success":true,"bytes-sent":true,"bytes-received":true}
Sep 10 23:41:01 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:01.870Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://10.10.10.3:6000","subconn-index":3,"subconn-size":4}
Sep 10 23:41:03 node-1 vault[8722]: {"level":"warn","ts":"2020-09-10T23:41:03.662Z","caller":"picker/roundrobin_balanced.go:91","msg":"balancer failed","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","picker":"picker-roundrobin-balanced","address":"http://10.10.10.3:6000","success":false,"bytes-sent":true,"bytes-received":false}
Sep 10 23:41:03 node-1 vault[8722]: {"level":"warn","ts":"2020-09-10T23:41:03.662Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-2699c3d4-922a-4dbd-865a-d27ff41bdb7c/127.0.0.1:6002","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Sep 10 23:41:05 node-1 vault[8722]: 2020-09-10T23:41:05.651Z [DEBUG] core: forwarding: error sending echo request to active node: error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Sep 10 23:41:06 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:06.162Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://localhost:6000","subconn-index":0,"subconn-size":4}
Sep 10 23:41:06 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:06.163Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"http://localhost:6000","success":true,"bytes-sent":true,"bytes-received":true}
Sep 10 23:41:06 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:06.829Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://10.10.10.4:6000","subconn-index":1,"subconn-size":4}
Sep 10 23:41:06 node-1 vault[8722]: {"level":"debug","ts":"2020-09-10T23:41:06.829Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"http://10.10.10.4:6000","success":true,"bytes-sent":true,"bytes-received":true}
Sep 10 23:41:06 node-1 vault[8722]: {"level":"warn","ts":"2020-09-10T23:41:06.870Z","caller":"picker/roundrobin_balanced.go:91","msg":"balancer failed","error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded","picker":"picker-roundrobin-balanced","address":"http://10.10.10.3:6000","success":false,"bytes-sent":true,"bytes-received":false}
Sep 10 23:41:06 node-1 vault[8722]: {"level":"warn","ts":"2020-09-10T23:41:06.870Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-2699c3d4-922a-4dbd-865a-d27ff41bdb7c/127.0.0.1:6002","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Sep 10 23:41:06 node-1 vault[8722]: 2020-09-10T23:41:06.870Z [ERROR] core: error checking health: error="context deadline exceeded"

etcd state

# ETCD_CLIENT_DEBUG=1 ETCDCTL_API=3 etcdctl -w table endpoint status --cluster --endpoints=10.10.10.3:6000,10.10.10.4:6000,10.10.10.5:6000
{"level":"debug","ts":"2020-09-10T23:42:50.263Z","caller":"balancer/balancer.go:60","msg":"registered balancer","policy":"picker-roundrobin-balanced","name":"etcd-picker-roundrobin-balanced"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:97","msg":"built balancer","balancer-id":"c5k35zolpq4a","policy":"picker-roundrobin-balanced","resolver-target":"endpoint://client-487dbbd8-c75e-4aa9-90bf-98f18670c6a9/10.10.10.3:6000"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:97","msg":"built balancer","balancer-id":"c5k35zolruuo","policy":"picker-roundrobin-balanced","resolver-target":"endpoint://client-1c6091fc-03fa-4671-b9d9-282f159630af/10.10.10.3:6000"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:148","msg":"resolved","picker":"picker-error","balancer-id":"c5k35zolpq4a","addresses":["10.10.10.3:6000","10.10.10.4:6000","10.10.10.5:6000"]}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:148","msg":"resolved","picker":"picker-error","balancer-id":"c5k35zolruuo","addresses":["10.10.10.3:6000","10.10.10.4:6000","10.10.10.5:6000"]}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:166","msg":"created subconn","address":"10.10.10.3:6000"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:166","msg":"created subconn","address":"10.10.10.3:6000"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:166","msg":"created subconn","address":"10.10.10.4:6000"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:166","msg":"created subconn","address":"10.10.10.4:6000"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:166","msg":"created subconn","address":"10.10.10.5:6000"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:166","msg":"created subconn","address":"10.10.10.5:6000"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-error","balancer-id":"c5k35zolruuo","connected":false,"subconn":"0xc000165670","subconn-size":3,"address":"10.10.10.3:6000","old-state":"IDLE","new-state":"CONNECTING"}
{"level":"warn","ts":"2020-09-10T23:42:50.266Z","caller":"connectivity/connectivity.go:81","msg":"connectivity recorder received unknown state","connectivity-state":"IDLE"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-error","balancer-id":"c5k35zolruuo","connected":false,"subconn":"0xc000165690","subconn-size":3,"address":"10.10.10.4:6000","old-state":"IDLE","new-state":"CONNECTING"}
{"level":"warn","ts":"2020-09-10T23:42:50.266Z","caller":"connectivity/connectivity.go:81","msg":"connectivity recorder received unknown state","connectivity-state":"IDLE"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-error","balancer-id":"c5k35zolruuo","connected":false,"subconn":"0xc0001656b0","subconn-size":3,"address":"10.10.10.5:6000","old-state":"IDLE","new-state":"CONNECTING"}
{"level":"warn","ts":"2020-09-10T23:42:50.266Z","caller":"connectivity/connectivity.go:81","msg":"connectivity recorder received unknown state","connectivity-state":"IDLE"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-error","balancer-id":"c5k35zolpq4a","connected":false,"subconn":"0xc00014e8d0","subconn-size":3,"address":"10.10.10.3:6000","old-state":"IDLE","new-state":"CONNECTING"}
{"level":"warn","ts":"2020-09-10T23:42:50.266Z","caller":"connectivity/connectivity.go:81","msg":"connectivity recorder received unknown state","connectivity-state":"IDLE"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-error","balancer-id":"c5k35zolpq4a","connected":false,"subconn":"0xc00014e8f0","subconn-size":3,"address":"10.10.10.4:6000","old-state":"IDLE","new-state":"CONNECTING"}
{"level":"warn","ts":"2020-09-10T23:42:50.266Z","caller":"connectivity/connectivity.go:81","msg":"connectivity recorder received unknown state","connectivity-state":"IDLE"}
{"level":"info","ts":"2020-09-10T23:42:50.266Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-error","balancer-id":"c5k35zolpq4a","connected":false,"subconn":"0xc00014e910","subconn-size":3,"address":"10.10.10.5:6000","old-state":"IDLE","new-state":"CONNECTING"}
{"level":"warn","ts":"2020-09-10T23:42:50.266Z","caller":"connectivity/connectivity.go:81","msg":"connectivity recorder received unknown state","connectivity-state":"IDLE"}
{"level":"info","ts":"2020-09-10T23:42:50.269Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-error","balancer-id":"c5k35zolruuo","connected":true,"subconn":"0xc000165690","subconn-size":3,"address":"10.10.10.4:6000","old-state":"CONNECTING","new-state":"READY"}
{"level":"info","ts":"2020-09-10T23:42:50.269Z","caller":"balancer/balancer.go:278","msg":"updated picker","picker":"picker-roundrobin-balanced","balancer-id":"c5k35zolruuo","policy":"picker-roundrobin-balanced","subconn-ready":["10.10.10.4:6000 (0xc000165690)"],"subconn-size":1}
{"level":"info","ts":"2020-09-10T23:42:50.269Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-error","balancer-id":"c5k35zolpq4a","connected":true,"subconn":"0xc00014e8f0","subconn-size":3,"address":"10.10.10.4:6000","old-state":"CONNECTING","new-state":"READY"}
{"level":"debug","ts":"2020-09-10T23:42:50.269Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"10.10.10.4:6000","subconn-index":0,"subconn-size":1}
{"level":"info","ts":"2020-09-10T23:42:50.269Z","caller":"balancer/balancer.go:278","msg":"updated picker","picker":"picker-roundrobin-balanced","balancer-id":"c5k35zolpq4a","policy":"picker-roundrobin-balanced","subconn-ready":["10.10.10.4:6000 (0xc00014e8f0)"],"subconn-size":1}
{"level":"info","ts":"2020-09-10T23:42:50.269Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-roundrobin-balanced","balancer-id":"c5k35zolpq4a","connected":true,"subconn":"0xc00014e910","subconn-size":3,"address":"10.10.10.5:6000","old-state":"CONNECTING","new-state":"READY"}
{"level":"info","ts":"2020-09-10T23:42:50.269Z","caller":"balancer/balancer.go:278","msg":"updated picker","picker":"picker-roundrobin-balanced","balancer-id":"c5k35zolpq4a","policy":"picker-roundrobin-balanced","subconn-ready":["10.10.10.4:6000 (0xc00014e8f0)","10.10.10.5:6000 (0xc00014e910)"],"subconn-size":2}
{"level":"info","ts":"2020-09-10T23:42:50.269Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-roundrobin-balanced","balancer-id":"c5k35zolruuo","connected":true,"subconn":"0xc0001656b0","subconn-size":3,"address":"10.10.10.5:6000","old-state":"CONNECTING","new-state":"READY"}
{"level":"info","ts":"2020-09-10T23:42:50.269Z","caller":"balancer/balancer.go:278","msg":"updated picker","picker":"picker-roundrobin-balanced","balancer-id":"c5k35zolruuo","policy":"picker-roundrobin-balanced","subconn-ready":["10.10.10.4:6000 (0xc000165690)","10.10.10.5:6000 (0xc0001656b0)"],"subconn-size":2}
{"level":"debug","ts":"2020-09-10T23:42:50.270Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"10.10.10.4:6000","success":true,"bytes-sent":true,"bytes-received":true}
{"level":"info","ts":"2020-09-10T23:42:53.298Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-roundrobin-balanced","balancer-id":"c5k35zolpq4a","connected":false,"subconn":"0xc00014e8d0","subconn-size":3,"address":"10.10.10.3:6000","old-state":"CONNECTING","new-state":"TRANSIENT_FAILURE"}
{"level":"info","ts":"2020-09-10T23:42:54.299Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-roundrobin-balanced","balancer-id":"c5k35zolpq4a","connected":false,"subconn":"0xc00014e8d0","subconn-size":3,"address":"10.10.10.3:6000","old-state":"TRANSIENT_FAILURE","new-state":"CONNECTING"}
{"level":"warn","ts":"2020-09-10T23:42:55.270Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///http://10.10.10.3:6000","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.10.10.3:6000: connect: no route to host\""}
Failed to get the status of endpoint http://10.10.10.3:6000 (context deadline exceeded)
+------------------------+------------------+---------+---------+-----------+-----------+------------+
|        ENDPOINT        |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+------------------------+------------------+---------+---------+-----------+-----------+------------+
|  http://localhost:6000 | a25a29d066a34a1b |  3.3.18 |  1.1 MB |      true |       285 |       1065 |
| http://10.10.10.4:6000 | a25a29d066a34a1b |  3.3.18 |  1.1 MB |      true |       285 |       1065 |
|  http://localhost:6000 | a25a29d066a34a1b |  3.3.18 |  1.1 MB |      true |       285 |       1065 |
| http://10.10.10.5:6000 | d274d29779d8ad22 |  3.3.18 |  1.1 MB |     false |       285 |       1065 |
|  http://localhost:6000 | a25a29d066a34a1b |  3.3.18 |  1.1 MB |      true |       285 |       1065 |
+------------------------+------------------+---------+---------+-----------+-----------+------------+

if we bring down NIC on machine with standby vault, similar failure is observed change etcd_api to v2 solves the issue, but we don't want to use v2.

if we first stop etcd on vault leader machine, then bring down NIC, everything works well. Logs below show the subconn-size is reduced from 4 to 3, the turned off etcd is kicked out from the pool temporarily

Sep 10 23:46:10 node-1 vault[9621]: {"level":"warn","ts":"2020-09-10T23:46:10.657Z","caller":"picker/roundrobin_balanced.go:91","msg":"balancer failed","error":"rpc error: code = Unavailable desc = transport is closing","picker":"picker-roundrobin-balanced","address":"http://10.10.10.3:6000","success":false,"bytes-sent":true,"bytes-received":true}
Sep 10 23:46:10 node-1 vault[9621]: {"level":"debug","ts":"2020-09-10T23:46:10.657Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://10.10.10.4:6000","subconn-index":1,"subconn-size":4}
Sep 10 23:46:10 node-1 vault[9621]: {"level":"info","ts":"2020-09-10T23:46:10.657Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-roundrobin-balanced","balancer-id":"c5k388r4uqxu","connected":false,"subconn":"0xc00079c060","subconn-size":4,"address":"http://10.10.10.3:6000","old-state":"READY","new-state":"CONNECTING"}
Sep 10 23:46:10 node-1 vault[9621]: {"level":"info","ts":"2020-09-10T23:46:10.657Z","caller":"balancer/balancer.go:278","msg":"updated picker","picker":"picker-roundrobin-balanced","balancer-id":"c5k388r4uqxu","policy":"picker-roundrobin-balanced","subconn-ready":["http://10.10.10.4:6000 (0xc00079c0a0)","http://10.10.10.5:6000 (0xc00079c0d0)","http://localhost:6000 (0xc00079c080)"],"subconn-size":3}
Sep 10 23:46:10 node-1 vault[9621]: {"level":"info","ts":"2020-09-10T23:46:10.658Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-roundrobin-balanced","balancer-id":"c5k388r4uqxu","connected":false,"subconn":"0xc00079c060","subconn-size":4,"address":"http://10.10.10.3:6000","old-state":"CONNECTING","new-state":"TRANSIENT_FAILURE"}
Sep 10 23:46:11 node-1 vault[9621]: {"level":"debug","ts":"2020-09-10T23:46:11.150Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://localhost:6000","subconn-index":0,"subconn-size":3}
Sep 10 23:46:11 node-1 vault[9621]: {"level":"debug","ts":"2020-09-10T23:46:11.150Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"http://localhost:6000","success":true,"bytes-sent":true,"bytes-received":true}
Sep 10 23:46:11 node-1 vault[9621]: {"level":"info","ts":"2020-09-10T23:46:11.658Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-roundrobin-balanced","balancer-id":"c5k388r4uqxu","connected":false,"subconn":"0xc00079c060","subconn-size":4,"address":"http://10.10.10.3:6000","old-state":"TRANSIENT_FAILURE","new-state":"CONNECTING"}
Sep 10 23:46:11 node-1 vault[9621]: {"level":"info","ts":"2020-09-10T23:46:11.658Z","caller":"balancer/balancer.go:214","msg":"state changed","picker":"picker-roundrobin-balanced","balancer-id":"c5k388r4uqxu","connected":false,"subconn":"0xc00079c060","subconn-size":4,"address":"http://10.10.10.3:6000","old-state":"CONNECTING","new-state":"TRANSIENT_FAILURE"}
Sep 10 23:46:11 node-1 vault[9621]: {"level":"debug","ts":"2020-09-10T23:46:11.845Z","caller":"picker/roundrobin_balanced.go:70","msg":"picked","picker":"picker-roundrobin-balanced","address":"http://10.10.10.4:6000","subconn-index":1,"subconn-size":3}
Sep 10 23:46:11 node-1 vault[9621]: {"level":"debug","ts":"2020-09-10T23:46:11.846Z","caller":"picker/roundrobin_balanced.go:89","msg":"balancer done","picker":"picker-roundrobin-balanced","address":"http://10.10.10.4:6000","success":true,"bytes-sent":true,"bytes-received":true}

Expected behavior vault cluster still works when the machine hosting vault and etcd is unreachable

Environment:

Vault server configuration file(s):

# Paste your Vault config here.
# Be sure to scrub any sensitive values
storage "etcd" {
  address = "http://10.10.10.3:6000,http://10.10.10.4:6000,http://10.10.10.5:6000" # local access to etcd; change if etcd moves
  etcd_api="v3"
  path = "vault/"
  ha_enabled = "true"
}

listener "tcp" {
  address = "0.0.0.0:7310"
  cluster_address = "0.0.0.0:6004"
  tls_cert_file = "{omitted}"
  tls_key_file = "{omitted}"
  tls_min_version = "tls12"
  tls_prefer_server_cipher_suites = "true"
}

api_addr = "https://10.10.10.3:7310"
cluster_addr = "https://10.10.10.3:6004"
disable_clustering = "false"
log_level = "Debug"

Additional context Add any other context about the problem here.

heatherezell commented 5 months ago

Hello, is this still an issue? Does the issue persist when using a newer version of Vault? Please let me know. Thanks!