habitat-sh / habitat-operator

A Kubernetes operator for Habitat services
Apache License 2.0
61 stars 17 forks source link

Use StatefulSets instead of Deployment #44

Closed lilic closed 6 years ago

lilic commented 7 years ago

Currently we are using Deployments to deploy our Habitat Service, but since we do not know what are deploying, what type of service that is, it could be anything from a DB to a simple Rails application. We should not just assume our Habitat service would be stateless.

Couple of advantages of StatefulSets:

These would be very useful especially if our service is for example a DB.

asymmetric commented 7 years ago

I think that for the moment, we expect our services to be stateless, and that stateful apps will be outside of the cluster.

So maybe this can be re-visited at a later point?

asymmetric commented 7 years ago

Cons:

asymmetric commented 6 years ago

For reference, two operators that deal with persistend data: 1 and 2.

asymmetric commented 6 years ago

Interesting article detailing some pros and cons.

asymmetric commented 6 years ago

Another option (suggested by @blixtra) is that we support both Deployments and StatefulSets, depending on the specific usecase. For example, Habitat services that require persistence and can benefit from some of the benefits from StatefulSets would be deployed as those.

jeremymv2 commented 6 years ago

Chef Server is a great example app we would love to deploy as a StatefulSet being that Elasticsearch and Postgresql are the backends in the stack which require stable, persistent storage.

asymmetric commented 6 years ago

@jeremymv2 The article you posted doesn't make a very compelling case for using StatefulSets IMO. The main point it makes is that StatefulSets have terminationGracePeriodSeconds, but I'm not sure how important that is for us.

jeremymv2 commented 6 years ago

@asymmetric I've been going back and forth myself trying to understand when persistent storage wouldn't make sense to implement under a Deployment in the operator.

My number one concern is to avoid the possibility of data corruption via concurrent pod access if Deployment replicas are > 1.

There is some good info here regarding storage guarantees for pods: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md#guarantees-provided-by-replica-sets-and-replication-controllers

Reading the above, gives me some pause for concern:

_ReplicaSets and ReplicationControllers both attempt to preserve availability of their constituent pods over ensuring at most one (of a pod) semantics. So a replica set to scale 1 will immediately create a new pod when it observes an old pod has begun graceful deletion, and as a result at many points in the lifetime of a replica set there will be 2 copies of a pod's processes running concurrently. Only access to exclusive resources like storage can prevent that simultaneous execution.

_Deployments, being based on replica sets, can offer no stronger guarantee.__

One thing I'm still unclear on though, is could a PersistentVolumeClaim, which is usable in a Deployment, be the extra item that helps ensure there is no concurrent access violation?

asymmetric commented 6 years ago

It seems the issues outlined here make implementing persistence with Deployments a bad idea, because Pods would end up sharing Volumes, which means DB processes would have to compete for access to the filesystem, which is bad.

Unless there are other considerations, I'd consider this closed.

/cc @blixtra.

lilic commented 6 years ago

@asymmetric Why close this? I thought we agreed to go with StatefulSets? :)

blixtra commented 6 years ago

Seems like you've all done the research. So if the outcome is to use StatefulSets then fine.

But the question that I still see open, is if it's so much extra work to support both StatefulSets and Deployments. What are the disadvantages for stateless applications if we go with an all StatefulSet solution? How much does the ordering requirement effect stateless applications in practice; slower to deploy and remove, would there always be a mount even if not used, etc.?

asymmetric commented 6 years ago

@LiliC Sorry, I thought the title was "Should we...?", and since we answered that, we could close :)

asymmetric commented 6 years ago

@blixtra as @LiliC mentioned, the ordering can be relaxed.

To answer your questions from before:

asymmetric commented 6 years ago

Just realized we don't necessarily need the Headless Service.

Habitat services use the gossip protocol to find each other (as long as they can bootstrap with one peer), and they find each other by IP; which means that the main use of the Headless Service, i.e. returning a list of DNS names of members, does not apply to us.

jeremymv2 commented 6 years ago

kdk5l

asymmetric commented 6 years ago

Another thing (as mentioned in today's standup):

There's the possibility that we could use an alternative approach to ring joining than the current peer-watch-file+ConfigMap based one:

This would have the advantage of being more Kubernetes-native, and the downside of forcing us to start a Headless Service.

Not saying we should do this, just writing thoughts down.

jeremymv2 commented 6 years ago

@asymmetric That's an interesting strategy and in truth, I don't think you even need the Headless Service, just the use of hostname is enough for the pod. I've got a POC of that working here: https://github.com/jeremymv2/launch-chef-in-kubernetes/blob/master/chef-server-pod.yml#L46-L47

This allows the services in the POD to form the ring.

asymmetric commented 6 years ago

The main thing I'm trying to figure out now is whether we want to let users decide what kind of PersistentVolume they will provision, or if the operator should/can force the decision.

The downside with us doing it is that designing a solution that works (across environments) and is easy to use has eluded me so far.


So I guess that leaves us with either:

The static case would look like this:

The dynamic one would look like this:

Some links:

Let me know if I missed something.

lilic commented 6 years ago

allowing users to define their own storage classes, with dynamic provisioning

I would strongly say we should go for that option, there are many different types of volumes for a reason and each use case needs its own, choosing it for the user beforehand is impossible to predict their desired user case.

jeremymv2 commented 6 years ago

Agreed, Dynamic is the only way to go here if it is to be widely adopted.

When a Habitat object is created, the PersistentVolume is automatically created and mounted on the Pod

I'm curious if it will also be possible to utilize a PersistentVolume that has been pre-provisioned by an admin?

asymmetric commented 6 years ago

@jeremymv2 Yes, that will be possible. It all depends on what StorageClass the user specifies in the CRD. If the StorageClass matches the one provided by an existing PersistentVolume object, that's the one that will be used.

From the docs:

When none of the static PVs the administrator created matches a user’s PersistentVolumeClaim, the cluster may try to dynamically provision a volume specially for the PVC