sys/storage/raft/bootstrap should have an option to force boostrap

drawks commented 2 years ago

Is your feature request related to a problem? Please describe. I recently conducted an exercise of doing a disaster recovery of a vault cluster (mysql storage backend, raft ha_storage) and discovered that if all of the machine local raft state is missing from all nodes (raft.db and vault.db) AND none of the nodes was a participant in the cluster previously that no node can be elected active. This makes sense since there is no raft cluster assembled in which to conduct the vote. However, the current bootstrap api (sys/storage/raft/bootstrap) will refuse to bootstrap a raft cluster because of the existence of the core/raft/tls key in the storage backend. Attempts to call the bootstrap API return an error like:

{"errors":["could not generate TLS keyring during bootstrap: TLS keyring already present"]}

However if that key is manually removed from the storage backend, for instance by deleting it from mysql in my case, the bootstrap API once again becomes available and you can reassemble the raft cluster.

Describe the solution you'd like Ideally I'd like to not have to do "brain surgery" on the storage backend during a disaster recovery exercise. If the sys/storage/raft/bootstrap api could take an argument that would instruct it to destructively re-bootstrap in spite of evidence of an existing cluster it would be much cleaner.

Describe alternatives you've considered AFAICT the only alternative is to manually edit the storage backend.

Explain any additional use-cases If the number of nodes in a cluster drops below the threshold to form a quorum this feature would also allow forcing a new master such that the members of the raft could be modified. Currently you cannot add or remove peers to the raft if there is no active leader.

heatherezell commented 2 years ago

Thanks, @drawks! Always appreciate seeing your comments and contributions. We'll get this in front of the engineering team soon. :)

ncabatoff commented 2 years ago

Have you seen https://www.vaultproject.io/docs/concepts/integrated-storage#manual-recovery-using-peers-json ? I'm not 100% certain that it works for a ha-only raft cluster, but I think it does.

Otherwise, I'm not clear on the use case. How did you get into this situation where the vault.db and raft.db were missing on all nodes? It might be that this is far enough outside of realistic scenarios that manually editing the storage backend is the right approach.

drawks commented 2 years ago

@ncabatoff As I mentioned I was doing a disaster recovery exercise. The setup was simulating a complete standup using ONLY a backup of the database. So the new machines had no preserved local storage and were also being brought up in a new network segment. The presumption being that when using raft for ha_storage only that the cluster could be trivially recreated using the contents of the storage backend. Which is /partially/ true, if you start a single machine with ha disabled it will read in the storage and give you functional access to the data, however the core/raft/tls key is still persisted. Fundamentally my ask here is to provide a native function for completely wiping the data used for HA from the primary datastore such that HA can be reinitialized without the ghosts of the past interfering.

I have not attempted the manual recovery as documented in your link. I will give it a try, but my original ask still seems valid. Which I'll restate once more, state from the ha configuration is persisted in the primary data store which prevents reuse of that data store in a different ha configuration.

ncabatoff commented 2 years ago

Yeah, that's fair. Out of curiosity, why are you using mysql+raft rather than the native mysql HA support?

drawks commented 2 years ago

We found that the mysql HA to be generally not as reliable.

ncabatoff commented 2 years ago

Then my question is: why not use raft exclusively, rather than have two storage subsystems to maintain, one only community-supported?

drawks commented 2 years ago

It is a valid question, but really an aside to the issue at hand. We have considerable experience with and infrastructure to support mysql as a primary data store. Native raft as the primary backend is appealing in some ways, but is just not the architectural decision we've made at the moment.

aphorise commented 2 years ago

hey @drawks - on a side note since I do see a lot of contributions from you here - I was wondering if you've ever done a performance comparison (speed, memory / data sizes and overall CPU use) of integrated storage vs MySQL (would be interesting to see that separate to this).

Specific to the ask I'm curious in these scenarios, can the less initiated follow the Vault Cluster Lost Quorum Recovery steps in order to recovery any remaining node (typically opt for last known leader) and thereafter rescale / rejoin the other nodes anew?

Obviously the sys/raw approach with surgery of internal in recovery, as highlighted earlier, is always an option.

hashicorp / vault

sys/storage/raft/bootstrap should have an option to force boostrap #14032