Separate HA coordination from physical backend

BRMatt commented 9 years ago

We're interested in using vault for encrypting a reasonable number of small payloads with the transit backend. One concern we have is that running a HA/standby setup seems to force you to store all encryption keys and authentication config in consul/zookeeper/etcd. We were hoping to store the encryption keys in more durable stores such as S3/RDS (mainly because those systems are easier to backup and restore), but also have several unsealed standby servers in case the leader dies. We're already using consul for service discovery in our infrastructure, so ideally we'd use it for coordination too.

Is it feasible to separate the configuration of the ha backend from the configuration of the physical backend, or is there some underlying requirement for them to be coupled?

I had a look through the code and it seems like the leadership election logic is nicely contained in vault/core.go. Part of the leader election process appears to use the physical backend for advertising its address, but aside from that I couldn't see any obvious points where the two types of backend overlap significantly.

armon commented 9 years ago

@BRMatt It would be possible to split, marking as an enhancement.

BRMatt commented 9 years ago

Nice, we'll likely have a stab at this over the next month or so. At the moment we're just exploring how we can use it in our app.

armon commented 9 years ago

Awesome! Let me know as you look into this and id be happy to provide feedback

JeanMertz commented 9 years ago

:+1: to this. Being able to use S3 as the storage backend greatly reduces the recovery-scenario, but without the HA capabilities, this is not a viable option for us.

jefferai commented 9 years ago

One workaround, even if not ideal, is to run Vault on VMs using S3 or others for their backing store. Then you can snapshot and backup via normal S3 means.

sepiroth887 commented 9 years ago

we have been thinkig about running cault ontop of mesos. There is no good story for persistent storage without loosing the flexibility of it and without the data is gc'ed when a task moves/fails and in this case vault may end up susceptible to dataloss

Its really tempting for me to drive leader election (very lightweight on io) through a shared zk cluster that other frameworks and tools (e.g. Kafka) could use for the same purpose.

Thats not feasible though once vault has to serve potentially 1000s if not 10000s of reads from the same zk cluster for data.

That said a feature like that adds more complexity and failure modes which im not sure warrant the benefit yet. I really like the current stability of the platform.

sepiroth887 commented 9 years ago

I kinda looked into what it would take to make this more plugable but my go is still very javaesque.

What ive been thinking is to remove the HABackend interface in favor of a function variable assigned to Backend which can be nil if no HA is required, but if it is can take the Lock function as an entry point.

Does that make sense?

Working on prototyping this. May attach a gist/link to my fork once i got past some godep issues...

jefferai commented 9 years ago

we have been thinkig about running cault ontop of mesos. There is no good story for persistent storage without loosing the flexibility of it and without the data is gc'ed when a task moves/fails and in this case vault may end up susceptible to dataloss

What do you mean by "no good story for persistent data storage"? Consul/etcd/ZK are self-replicating; you just have to ensure Mesos keeps a minimum number running to keep quorum. Vault isn't susceptible to data loss in such a scenario unless it's not set up properly. What is the actual scenario that you are concerned about?

Thats not feasible though once vault has to serve potentially 1000s if not 10000s of reads from the same zk cluster for data.

I can't point to benchmarks, but we've had a lot of feedback from users and customers indicating that Consul is significantly faster than etcd and ZK. Vault also has a large read cache to avoid hitting backend storage for reads of the same data over and over. There are also consistency tunables you could use with (at least) Consul, if you felt like eventual consistency was good enough for your needs. This would allow non-leader Consul nodes to serve reads.

What concerns me -- and this is a separate issue from whether a separate HA mechanism could be used than backing store -- is that it seems like you are making assumptions about how Vault will scale based on backend storage alone. Depending on what exactly you are doing with Vault (you haven't mentioned how you actually plan to use it) it could scale extremely well due to its read cache or due to using e.g. Consul rather than ZooKeeper, or it could scale very poorly if you are e.g. using features that are constantly generating cryptographic keys and you don't have enough entropy on the system.

Before assuming that the system will behave in a particular way, I think you would be well served performing some quantitative analysis that tests Vault based on how you are planning to actually use it. You may find that it indeed struggles to scale, or you may find that it's completely happy well over your actual needs while running on ZK or Consul.

sepiroth887 commented 9 years ago

Very good points thank you :)

WHat i like us to protect against is Mesos loosing all tasks and cleaning up by deleting the "ephemeral" data. I assume the same risk (albeit low one) may exist with consul.

The other points i will have to investigate and i think most ops in vault will be reads of various paths in the generic secret backend with some use of the pki backend for cert generation.

So with read caching this may not be an issue at all.

I will have a deep dive into what it will take us to operationalize consul as well.

JeanMertz commented 8 years ago

Consul/etcd/ZK are self-replicating; you just have to ensure Mesos keeps a minimum number running to keep quorum.

That's exactly the kind of failure scenario I'd like to protect against. Also, having persistent state "swarming" around in your cluster is less than ideal, it changes the mindset of managing that cluster.

If Consul was only used for HA leader election, and S3 could be used as the storage backend, then losing an entire cluster means downtime, but no data loss. Currently, we solve this by mounting persistent disks to our consul containers, but it's cumbersome, and far less ideal than the above described setup/scenario.

jefferai commented 8 years ago

Note that for Consul, consul-replicate can be used to stream Consul state to a remote server for backup. Just as an FYI, if you didn't know about it.

sepiroth887 commented 8 years ago

We have been running with attached disks for persistence for a while now and it works. The reason i was happy to put up with it is that we run more than just this zookeeper in the same fashion so it's not really a snowflake.

Though id love to see the same semantics as mesos where zookeeper is only used for leader election and the backend can be choosen from all other storage providers (including zookeeper)

moofish32 commented 8 years ago

So I am interested in the S3 capability separate from the HA feature, count that as a +1 for @BRMatt original feature request.

However, WHY do I want this?

Storing encrypted sensitive information in S3 is approved vs Consul is another explanation to the approval authorities (which is do-able if this was the only need)
It turns out I have larger chunks of information riddled with sensitive data that I don't really want to redact, I just want to store this document (or blob) and pay for the storage
Vault already handles rolling my keys across this larger volume of S3 stored data and I don't want to reinvent that code/get it wrong 3 times and maintain some proprietary solution

So while this should be do-able if this feature is added, I could solve my problem by:

Use transient backend for encryption as a service and write the data to S3
Roll a fairly simply key rolling based on S3 metadata (imagine rigorous testing here)

This leads to the final question, do I really just want a blob storage api? If I could call an api for blob data that supported pluggable backends (think S3 first, but why not Dynamo with it's new Encryption Client, Mongo, Cassandra, 'next amazing big data store with community support'). I don't think I'm asking for HA separation if I had that? Does anybody else have this use case and does it fit more cleanly within the Vault design or am I trying to grossly misuse the tool?

hashicorp / vault

Separate HA coordination from physical backend #395