openshift / origin

Conformance test suite for OpenShift
http://www.openshift.org
Apache License 2.0
8.49k stars 4.7k forks source link

OpenShift HA design #3597

Closed smarterclayton closed 9 years ago

smarterclayton commented 9 years ago

OpenShift masters should be configurable to run in a redundant, failure tolerant, and horizontal scale out manner.

Masters are composed of three components:

In the future, the master should be decomposeable into those three roles and scaled independently.

For an HA setup, the apiserver and proxy bastions should be run with more than a single instance, load balanced at an endpoint that is signed by the masters signing authority, for use by nodes and controllers (internal clients) and web and command line tools (external clients). The controllers should run a single instance, and if that instance stops running, another instance should be started.

Design points:

Run a singleton OpenShift master (the bootstrapper) managing a small number of infrastructure nodes, that hosts a set of masters and other infrastructure. The bootstrapper runs a replica set of masters and etcd nodes, offers persistent storage for etcd, manages health checks on the masters, and when the masters terminate, restarts them. The bootstrapper may leverage secrets to store the config for the hosted masters. The internal and external endpoints for the hosted masters should leverage internal and external service DNS names and the NodePort function to yield stable DNS. Other infrastructure components may be hosted with the master.

Advantages:

Disadvantages:

Run three (or more) openshift infrastructure nodes, with local file manifests for each instance of the master, and create similar for the etcd cluster. Use a local host directory to store the master and etcd config and etcd storage, and have the masters listen on the host network. Configure each infrastructure node to loop back to the current host to connect to the master on the external port. Each node should then be configured to point to a DNS address containing the master infrastructure node. Optionally, the infrastructure nodes can use that DNS address.

Disadvantages:

TBD:

smarterclayton commented 9 years ago

@liggitt @danmcp @detiber @brenton @twiest @derekwaynecarr @ncdc

This is a draft of the long term HA design. It's a continuation of the Kube HA design but pushes it into self hosting and bootstrapping - we're likely to discuss this upstream post 1.0, but I'd rather not spend the next year reinventing the wheel for our core stories.

detiber commented 9 years ago

@smarterclayton I like the idea of bootstrapper hosts, but I dislike the idea of it being a singleton host, we'll need some type of HA story for it, even if it is just RHEL HA in an active/passive manner.

smarterclayton commented 9 years ago

Why do we need an HA story for it? Why can't the HA story be "restart the machine"? Why is active/passive better than "get an instance running again with the same DNS and disks"?

On Mon, Jul 13, 2015 at 11:03 AM, Jason DeTiberus notifications@github.com wrote:

@smarterclayton https://github.com/smarterclayton I like the idea of bootstrapper hosts, but I dislike the idea of it being a singleton host, we'll need some type of HA story for it, even if it is just RHEL HA in an active/passive manner.

— Reply to this email directly or view it on GitHub https://github.com/openshift/origin/issues/3597#issuecomment-120960544.

Clayton Coleman | Lead Engineer, OpenShift

smarterclayton commented 9 years ago

This is implemented @sdodson also FYI

abhat commented 9 years ago

@smarterclayton is this already in v3.1?

smarterclayton commented 9 years ago

This is in origin 1.0.6, and will be supported by the ansible installer for origin 1.1 and OSE 3.1