oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
251 stars 39 forks source link

Want ability to have anti-affinity #1705

Open rmustacc opened 2 years ago

rmustacc commented 2 years ago

There are a number of different classes of services that folks run that want to be HA. This runs the gamut and generally includes things like databases (e.g. MySQL, Cockroach, Postgres, MSSQL), things that have built-in consensus (raft, zookeeper, etc.).

Most of these boil down into saying I would like to ensure that all of the instances of this thing don't end up in the same failure domain. For most smaller deployments (e.g. a single rack) this failure domain is probably the individual compute sled. For a multi-rack scenario, the failure mode may be larger and propagate to a rack-level or even cell-level.

This is to track that we want to think about what this means for Nexus, taking into consideration what others have done, and eventually write up an RFD on this, while taking into account a multi-rack future.

dwradcliffe commented 2 months ago

Even just for testing, this would be very useful for us.