Closed ngarafol closed 7 months ago
Hello -
This may be an issue with how the pod is destroyed according to the helm chart, but I'm not 100% versed in helm and kubernetes so maybe I'm wrong š
It seems we simply kill the process, which is normally fine for Vault, but with Raft I believe we need another step:
$ vault operator raft remove-peer <peer id>
cc @jasonodonnell
This issue may be more appropriate to be on hashicorp/vault-helm but we can leave it here for now until there's a bit more investigation.
Thanks!
Hi @ngarafol,
The following environment variable needs to be added to the Vault StatefulSet for this to work:
- name: VAULT_CLUSTER_ADDR
value: "https://$(HOSTNAME):8201"
This will change Vault to use dns instead of IP addresses when tracking nodes in the cluster.
Hope that helps!
I had to do more modifications. If I do as @jasonodonnell proposed, hostname is not getting substituted, so vault status reads:
/ $ vault status
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 1
Threshold 1
Version 1.4.0-beta1
Cluster Name vault-cluster-769d437c
Cluster ID f0c86e29-32aa-9626-745d-11c8fc5c9083
HA Enabled true
HA Cluster https://$(HOSTNAME):8201
HA Mode active
So I had to add:
- name: HOST_NAME$
valueFrom:$
fieldRef:$
fieldPath: metadata.name$
and
- name: VAULT_CLUSTER_ADDR$
value: "https://$(HOST_NAME).vault-headless:8201"$
inside env of server-statefulset yaml template file. Also, note that I used $(HOST_NAME).vault-headless because that is only record that resolves inside pods. Using only hostname wont resolve for some weird reason.
And now after running kubectl delete pod vault-2
I ran into this (log from vault-0):
2020-03-09T11:19:10.225Z [DEBUG] storage.raft: failed to contact: server-id=$(10.10.117.96) time=32.177825809s
2020-03-09T11:19:11.128Z [DEBUG] core.cluster-listener: creating rpc dialer: alpn=raft_storage_v1 host=raft-64efb27c-0356-3e72-890b-d1d68148edc6
2020-03-09T11:19:11.154Z [ERROR] storage.raft: failed to heartbeat to: peer=vault-2.vault-headless:8201 error="dial tcp: lookup vault-2.vault-headless on 10.10.0.3:53: no such host"
2020-03-09T11:19:12.707Z [DEBUG] storage.raft: failed to contact: server-id=$(10.10.117.96) time=34.659817361s
Somehow it wont resolve vault-2.vault-headless from vault-0 pod but nslookup works inside pod:
$ kubectl exec -it vault-0 nslookup vault-2.vault-headless 10.10.0.3
Server: 10.10.0.3
Address 1: 10.10.0.3 coredns.kube-system.svc.in....
Name: vault-2.vault-headless
Address 1: 10.10.117.97 10-10-117-97.vault.default.svc.in....
Raft config is showing three nodes, but one with wrong ip (the one before pod deletion).
Other than deleting pod, what would be appropriate simulation of pod missing/gone?
EDIT: Seems I got it working, but have to use hostname instead of pod_ip as node_id in the future to avoid confusion :nerd_face:
Hi @ngarafol Can you update the issue with a more clear set of instructions. Also, is raft backend working well for you? I would really like to move towards raft and move away from consul.
Hi @webmutation. This issue is based on instructions from @jasonodonnell listed here https://github.com/hashicorp/vault-helm/issues/40
Basically, you need vault-helm master and pull files from https://github.com/hashicorp/vault-helm/pull/58 and merge locally. I am using transit unseal, but it doesnt matter how you unseal vault.
Since its safer to use hostname than ip address, you can edit settings like I did here: https://github.com/hashicorp/vault/issues/8489#issuecomment-596484299
Regarding raft itself, I have been using it for few days total so cant comment at the moment. We also have consul backed setup, but I am testing raft one.
If all this is too brief still, @jasonodonnell or me can try to write more detailed guide, when time permits.
Thanks. That should be enough to get me going.
The part that was less clear to me was the settings file, I am unsure on what changed regarding the hostname, if it was only what you commented or if you did something more, indeed it would seem that DNS instead of IP would be the only way to resolve... but my main worry is what happens if the pod gets rescheduled and is not terminated with vault operator raft remove-peer <peer id>
did you observe any split-brain situations so far ?
Yeah we use consul as well, but it's a bit overkill, I know it is the best supported backed, but it seems like huge overkill. Embedded Raft would be more lightweight and easier to manage.
@catsby
Not 100% sure but thinking aloud, removing raft peer is not necessary here. I wanted peer with same id (and new ip) to return to cluster. If you remove it, you have to manually connect peer to leader, and I dont want to do that. I wanted to test HA resilience with simulating probably bad example - deleting pod. EDIT: Or could I be wrong? Since there are PVCs used, new node will know who is the leader and again try to connect to it? If that is true, what happens in case leader gets changed to different node before new node "boots up"?
Hi @ngarafol,
The following environment variable needs to be added to the Vault StatefulSet for this to work:
- name: VAULT_CLUSTER_ADDR value: "https://$(HOSTNAME):8201"
This will change Vault to use dns instead of IP addresses when tracking nodes in the cluster.
Hope that helps!
I have same question with the post.
I do believe that use hostname will help. However, I feel what I really want is to have a way to auto-rejoin when new pod deployed and old pod got deleted.
Right now, the only way is to use rejoin config with fixed entries.
Update at 2022: auto-rejoin can be achived via auto-join with k8s as the provider. refer: https://github.com/hixichen/deploy-open-source-vault-on-gke/blob/main/helm/values-dev.yaml#L116
Auto rejoin works for me, as I said. Deleted node (pod) and new node (pod) automatically rejoined since by raft_id its the same node...
No, it seems that it is not possible to recover a Raft cluster if IP addresses are used and they change.
I have deployed the Helm chart hashicorp/vault-helm
in HA mode with Raft and 3 nodes. By default it injects POD_IP addresses everywhere and the Raft setup looks like:
$ vault operator raft list-peers
Node Address State Voter
---- ------- ----- -----
91ba5725-c624-9915-1fbb-3a8ec171e29f 100.96.12.86:8201 leader true
d2b72ece-c095-4289-0ee1-a29d60b84324 100.96.14.119:8201 follower true
f712c3ed-c2a2-9b7d-f83c-effaad8a99af 100.96.8.104:8201 follower true
If I then take down all the Vault nodes by deleting the Helm chart with $ helm delete --purge vault
(leaving PVC and PV intact, this means that the storage is not removed). And I deploy the same Helm chart again and my Kubernetes cluster assigns completely different IP addresses to all Vault nodes. I get the following situation that it is impossible to recover from (almost no command works):
$ vault status
Key Value
--- -----
Recovery Seal Type shamir
Initialized true
Sealed false
Total Recovery Shares 1
Threshold 1
Version 1.4.2
Cluster Name vault-cluster-c8fdde71
Cluster ID 8fccaa29-df37-4211-9dfb-17f5d5393a8d
HA Enabled true
HA Cluster https://100.96.12.86:8201
HA Mode standby
Active Node Address https://100.96.12.86:8200
Raft Committed Index 2652
Raft Applied Index 2652
$ vault token lookup
Error looking up token: context deadline exceeded
$ vault operator raft list-peers
Error reading the raft cluster configuration: context deadline exceeded
$ vault operator raft join https://vault-api-addr:8200
Error joining the node to the raft cluster: Error making API request.
URL: POST https://127.0.0.1:8200/v1/sys/storage/raft/join
Code: 500. Errors:
* raft storage is already initialized
{"@level":"info","@message":"entering candidate state","@module":"storage.raft","@timestamp":"2020-06-17T13:28:56.379002Z","node":{},"term":544}
{"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2020-06-17T13:28:56.380535Z","alpn":"raft_storage_v1","host":"raft-a906b8db-1279-2d66-4075-be3f5f55b544"}
{"@level":"debug","@message":"votes","@module":"storage.raft","@timestamp":"2020-06-17T13:28:56.382887Z","needed":2}
{"@level":"debug","@message":"vote granted","@module":"storage.raft","@timestamp":"2020-06-17T13:28:56.382932Z","from":"d2b72ece-c095-4289-0ee1-a29d60b84324","tally":1,"term":544}
{"@level":"debug","@message":"creating rpc dialer","@module":"core.cluster-listener","@timestamp":"2020-06-17T13:28:56.382978Z","alpn":"raft_storage_v1","host":"raft-a906b8db-1279-2d66-4075-be3f5f55b544"}
{"@level":"debug","@message":"forwarding: error sending echo request to active node","@module":"core","@timestamp":"2020-06-17T13:28:58.580081Z","error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 100.96.12.86:8201: i/o timeout\""}
{"@level":"error","@message":"failed to make requestVote RPC","@module":"storage.raft","@timestamp":"2020-06-17T13:29:00.270019Z","error":"dial tcp 100.96.12.86:8201: i/o timeout","target":{"Suffrage":0,"ID":"91ba5725-c624-9915-1fbb-3a8ec171e29f","Address":"100.96.12.86:8201"}}
{"@level":"error","@message":"failed to make requestVote RPC","@module":"storage.raft","@timestamp":"2020-06-17T13:29:00.273109Z","error":"dial tcp 100.96.8.104:8201: i/o timeout","target":{"Suffrage":0,"ID":"f712c3ed-c2a2-9b7d-f83c-effaad8a99af","Address":"100.96.8.104:8201"}}
{"@level":"debug","@message":"forwarding: error sending echo request to active node","@module":"core","@timestamp":"2020-06-17T13:29:03.580091Z","error":"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp 100.96.12.86:8201: i/o timeout\""}
{"@level":"warn","@message":"Election timeout reached, restarting election","@module":"storage.raft","@timestamp":"2020-06-17T13:29:03.957558Z"}
As you can see it attempt to connect to the previous active node IP address (100.96.12.86
), but there is no Vault node on that IP anymore. And with vault operator raft join
it is not possible to join a valid Vault cluster, because it is already initialized. The only solution is to use DNS everywhere as @jasonodonnell suggested or you could risk loosing access to Vault after a disaster.
If the node was removed via remove-peer
, you'd have to clear out its raft data first (i.e. the directory specified in the config's storage.path
) in order to rejoin it back to the cluster. I'd be good to take a backup of that dir or move it elsewhere before you do so just in case.
@jasonodonnell cc : @hixichen
I am running integrated storage with RAFT. Version 1.9.3
I have the DNS setup to use headless.
name: VAULT_CLUSTER_ADDR
value: https://$(HOSTNAME).musw2-0-vault-internal:8201
This is my rety_join
path = "/vault/data"
retry_join {
leader_api_addr = "https://musw2-0-vault-0.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "https://musw2-0-vault-1.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "https://musw2-0-vault-2.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "https://musw2-0-vault-3.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
retry_join {
leader_api_addr = "https://musw2-0-vault-4.musw2-0-vault-internal:8201"
leader_ca_cert_file = "/vault/userconfig/vault-server-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-server-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-server-tls/vault.key"
}
}
I can confirm the DNS entries are correct
and
The other nodes are getting connection refused and can not join the cluster and the heartbeat is failing.
Logs: ā 2022-06-24T19:21:45.815Z [ERROR] storage.raft: failed to heartbeat to: peer=musw2-0-vault-1.musw2-0-vault-internal:8201 error="dial tcp 172.20.9.24:8201: connect: connection refused" ā ā 2022-06-24T19:21:46.050Z [INFO] http: TLS handshake error from 10.241.247.199:4602: EOF ā ā 2022-06-24T19:21:46.540Z [INFO] http: TLS handshake error from 10.241.247.198:55618: EOF ā ā 2022-06-24T19:21:46.645Z [ERROR] storage.raft: failed to appendEntries to: peer="{Nonvoter f8cf0d96-a735-2172-90da-111e83423303 musw2-0-vault-1.musw2-0-vault-internal:8201}" error="dial tcp 172.20.9.24:8201: co ā ā nnect: connection refused" ā ā 2022-06-24T19:21:47.089Z [WARN] core.cluster-listener: no TLS config found for ALPN: ALPN=["h2", "http/1.1"]
I have 2 separate clusters running and have ran vault operator init
on the one above.
The other cluster has similar logs from all nodes and non are unsealed. I am using Azure key/vault for auto unseal.
This is critical for our implementation and we are Enterprise customers and I will be reaching out but wanted to post here as well.
Thanks.
@fewknow Thanks for sharing, but I think your issue is more related to cluster being sealed. Original issue I had (OP) was that raft rejoin would not work on already unsealed cluster since IP address was used instead of fqdn.
@ngarafol - yes, my issue was just ports. 8201 to 8200 solved it. Sorry about the noise.
I suspect that the issue the related to the setup / configuration (in Azure?).
Hey @ngarafol do you still require further input here or is it okay to close? - sorry I'm late here and trying to understand what's next.
Has this issue been reproduced in a current version of Vault? Please let me know if this is still applicable. Thanks!
Original issue was due to IP being used instead of fqdn. I believe as long as fqdn is used, this issue is not existing at all. Will close, feel free to reopen.
Describe the bug Using vault helm charts, with raft and ha setup. After unsealing and joining peers to raft, deleting one of the pods makes it unable to rejoin raft cluster and other nodes still try to communicate with old pod.
To Reproduce Steps to reproduce the behavior:
Expected behavior New node should be able to rejoin raft cluster and other nodes should stop using old raft node.
Environment:
Vault server configuration file(s):
Using instructions from here: https://github.com/hashicorp/vault-helm/issues/40
logs and k8s info show deleting vault-2 pod and vault-0 pod still using old node_id. Same behaviour is with 1.4.0-beta1.
I managed to run raft remove-peer to remove old peer, but still cant rejoin and dont know how to proceed so need some guidance.