Closed lilic closed 6 years ago
I think that for the moment, we expect our services to be stateless, and that stateful apps will be outside of the cluster.
So maybe this can be re-visited at a later point?
Cons:
If an application doesn’t require any stable identifiers or ordered deployment, deletion, or scaling, you should deploy your application with a controller that provides a set of stateless replicas.
Interesting article detailing some pros and cons.
Another option (suggested by @blixtra) is that we support both Deployments and StatefulSets, depending on the specific usecase. For example, Habitat services that require persistence and can benefit from some of the benefits from StatefulSets would be deployed as those.
Chef Server is a great example app we would love to deploy as a StatefulSet being that Elasticsearch and Postgresql are the backends in the stack which require stable, persistent storage.
@jeremymv2 The article you posted doesn't make a very compelling case for using StatefulSets IMO. The main point it makes is that StatefulSets have terminationGracePeriodSeconds
, but I'm not sure how important that is for us.
@asymmetric I've been going back and forth myself trying to understand when persistent storage wouldn't make sense to implement under a Deployment in the operator.
My number one concern is to avoid the possibility of data corruption via concurrent pod access if Deployment replicas are > 1.
There is some good info here regarding storage guarantees for pods: https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/pod-safety.md#guarantees-provided-by-replica-sets-and-replication-controllers
Reading the above, gives me some pause for concern:
_ReplicaSets and ReplicationControllers both attempt to preserve availability of their constituent pods over ensuring at most one (of a pod) semantics. So a replica set to scale 1 will immediately create a new pod when it observes an old pod has begun graceful deletion, and as a result at many points in the lifetime of a replica set there will be 2 copies of a pod's processes running concurrently. Only access to exclusive resources like storage can prevent that simultaneous execution.
_Deployments, being based on replica sets, can offer no stronger guarantee.__
One thing I'm still unclear on though, is could a PersistentVolumeClaim
, which is usable in a Deployment, be the extra item that helps ensure there is no concurrent access violation?
It seems the issues outlined here make implementing persistence with Deployment
s a bad idea, because Pod
s would end up sharing Volume
s, which means DB processes would have to compete for access to the filesystem, which is bad.
Unless there are other considerations, I'd consider this closed.
/cc @blixtra.
@asymmetric Why close this? I thought we agreed to go with StatefulSets
? :)
Seems like you've all done the research. So if the outcome is to use StatefulSets then fine.
But the question that I still see open, is if it's so much extra work to support both StatefulSets and Deployments. What are the disadvantages for stateless applications if we go with an all StatefulSet solution? How much does the ordering requirement effect stateless applications in practice; slower to deploy and remove, would there always be a mount even if not used, etc.?
@LiliC Sorry, I thought the title was "Should we...?", and since we answered that, we could close :)
@blixtra as @LiliC mentioned, the ordering can be relaxed.
To answer your questions from before:
Just realized we don't necessarily need the Headless Service.
Habitat services use the gossip protocol to find each other (as long as they can bootstrap with one peer), and they find each other by IP; which means that the main use of the Headless Service, i.e. returning a list of DNS names of members, does not apply to us.
Another thing (as mentioned in today's standup):
There's the possibility that we could use an alternative approach to ring joining than the current peer-watch-file
+ConfigMap
based one:
--peer X
, where X is the DNS name of one of the nodes (meaning, one of the supervisors would have itself as the bootstrap)Pod
with --peer $hostname-of-pod-0
as argument, and the ring would be formedThis would have the advantage of being more Kubernetes-native, and the downside of forcing us to start a Headless Service.
Not saying we should do this, just writing thoughts down.
@asymmetric That's an interesting strategy and in truth, I don't think you even need the Headless Service, just the use of hostname
is enough for the pod. I've got a POC of that working here: https://github.com/jeremymv2/launch-chef-in-kubernetes/blob/master/chef-server-pod.yml#L46-L47
This allows the services in the POD to form the ring.
The main thing I'm trying to figure out now is whether we want to let users decide what kind of PersistentVolume
they will provision, or if the operator should/can force the decision.
The downside with us doing it is that designing a solution that works (across environments) and is easy to use has eluded me so far.
hostPath
doesn't work on multi-node setups, as it has no notion of node affinity (so it's a no-go)local
is alpha, and doesn't yet support dynamic provisioning
So I guess that leaves us with either:
local
and static provisioning, orThe static case would look like this:
StorageClass
with provisioner = local
and a default name (foo
)PersistentVolume
object for each Pod in the StatefulSet
, with storageClass = foo
StatefulSet
binds the PVC to the PVPod
s, and create PV
s accordinglyThe dynamic one would look like this:
DefaultStorageClass
admission controllerStorageClass
, with a provisioner
field of their choosing (i.e. kubernetes.io/glusterfs
)StorageClass
in the CRDHabitat
object is created, the PersistentVolume
is automatically created and mounted on the Pod
Some links:
Let me know if I missed something.
allowing users to define their own storage classes, with dynamic provisioning
I would strongly say we should go for that option, there are many different types of volumes for a reason and each use case needs its own, choosing it for the user beforehand is impossible to predict their desired user case.
Agreed, Dynamic is the only way to go here if it is to be widely adopted.
When a Habitat object is created, the PersistentVolume is automatically created and mounted on the Pod
I'm curious if it will also be possible to utilize a PersistentVolume
that has been pre-provisioned by an admin?
@jeremymv2 Yes, that will be possible. It all depends on what StorageClass
the user specifies in the CRD. If the StorageClass
matches the one provided by an existing PersistentVolume
object, that's the one that will be used.
From the docs:
When none of the static PVs the administrator created matches a user’s PersistentVolumeClaim, the cluster may try to dynamically provision a volume specially for the PVC
Currently we are using Deployments to deploy our Habitat Service, but since we do not know what are deploying, what type of service that is, it could be anything from a DB to a simple Rails application. We should not just assume our Habitat service would be stateless.
Couple of advantages of StatefulSets:
These would be very useful especially if our service is for example a DB.