kubernetes / community

Kubernetes community content
Apache License 2.0
11.91k stars 5.16k forks source link

Proposal: Implement stateful TPR Flock #424

Closed tamalsaha closed 7 years ago

tamalsaha commented 7 years ago

Birds of a feather flock together

When we run stateful apps (apps that store data in disk) like GlusterFS or various databases, we face a choice which Kubernetes object to use for provisioning such objects. Here are the requirements:

This can't be achieved in cloud providers that do not have native support for persistent storage or Kubernetes does not have volume controller (eg, DigitialOcean, Linode, etc).

Here is my proposal on how to meet the above requirements in a cloud provider agnostic way.

StatefulSet: If the underlying cloud provider have native support cloud disk and has built-in support in Kubernetes (aws/gce/azure), then we can use StatefulSet. We can prevision disks manually and bind them with claims. We might be able to also provision them using dynamic provisioning. Moreover, StatefulSets will allow using pod name as a stable network ID. Users can also use pod placement options to ensure that pods are distributed across nodes. This allows for HA.

DaemonSet: Cloud providers that does not support built-in storage and/or has no native support in Kubernetes (eg, DigitalOcean, Linode) can't use StatefulSets to run stateful apps. Stateful apps running in these clusters must use hostpath to store data or risk losing it when pods restart. StatefulSet can't dynamically provision host path bound PVCs. In these cases, we could use DaemonSet. We have to use hostpath or emptyDir` type PV with the DaemonSet. If DaemonSets are run with pod network, no stable ID is possible. If DaemonSets run with host network, then they might use node IP. Node names are generally not routable. But Node IPs are not stable either, since most times these are allocated via DHCP. Also, for cloud providers like DigitalOcean, host network are also shared and not safe to run with out authentication.

Luckily, we can achieve something similar to StatefulSet in such providers. The underlying process is based on how named headless services work as described here: https://kubernetes.io/docs/admin/dns/ .

In these types of providers, we have to run N ReplicaSet with replica=1. We can use a fixed hostpath. We can chose N nodes and index then from 0..n-1. We apply a nodeSelector with these RCs to ensure rc with index i always runs on node with index i. Since they are on separate node, they can safely use same host path. For network ID, we set both hostname and sub-domain in the PodTemplate for these RCs. This will give the pods a dns name the same was StatefulSets pods get. Since these pods are using pod network, it should be safe to run applications without authentication. Now, we have N pods with stable name and running on different nodes using hostpath. Voila!

To simplify the full process, we can create a new TPR called Flock. We implement GlusterFS or kube databases using this TPR. The Flock controller will be in charge of translating this into the appropriate Kubernetes object based on flags set on the controller.

Is there a simpler way to achieve this?

tamalsaha commented 7 years ago

Looks like we can just write / use a dynamic provisioner for HostPath. Thanks @smarterclayton .

https://github.com/kubernetes-incubator/external-storage/tree/master/docs/demo/hostpath-provisioner