piraeusdatastore / piraeus-ha-controller

High Availability Controller for stateful workloads using storage provisioned by Piraeus
Apache License 2.0
15 stars 8 forks source link

Applying the on-storage-lost label outside of StatefulSets #3

Closed immanuelfodor closed 3 years ago

immanuelfodor commented 3 years ago

The readme (https://github.com/piraeusdatastore/piraeus-ha-controller#deploy-your-stateful-workloads) gives an example of StatefulSet usage of the linstor.csi.linbit.com/on-storage-lost: remove label and mentions stateful applications. I was wondering if I could use the label for other kinds of pods, namely Deployment, DeamonSet or Pod. I'm not that proficient from Golang but the source here (https://github.com/piraeusdatastore/piraeus-ha-controller/blob/56f633f9363be272bee40be9bc868c585f8695ae/pkg/hacontroller/controller.go) mentions only pods, so it seems to me I could use the label other than StatefulSets.

As StatefulSet members have their own PVC automatically created, it's natural that replicas don't interfere. Same for DeamonSet members running on each node. What I'm more interested in is the Deployment kind. I'm running most of my apps as Deployment with replica count 1. Would the HA Controller work with these to speed up PV reattachment on another node when the original node is going down? (I could naturally test it by killing a node but I'd rather not :grinning: ) Maybe a clarification of stateful apps is advisable in the HA Controller and operator readmes after this issue is answered.

Bonus question: What's the case with Deployments with replica count >1? (This is more of a general Piraeus usage question, not strictly HA Controller-related.) I've observed a small delay when a Deployment had a pod running on one node, then I've deleted that pod, and the scheduler started another pod on another node. In this case, the replica count was more like 2, as one pod was in Terminating state and the new one in ContainerInitializing state. The new pod couldn't flip to Running state until the first one terminated, and even there was a small delay of around 10-15s until the PV was assigned to the new node from the first one. The related PVC was defined with ReadWriteMany. Would Piraeus be able to mount the same PV on two nodes at a time? (I suppose not, as there would be two "primary" mounts and it could interfere with the underlying DRBD replication. I think I've read something like this somewhere but not sure where.) Would it only work with Deployment replica count >1 only if the replicas are all running on the same node?

WanzenBug commented 3 years ago

In essence: it will work with any Pod using the on-storage-lost: remove label. That is of course only useful if:

Since Piraeus does not support the RWX access mode, you can't really use such PVCs in Deployments with more then one replica. In theory it supports the CSI spec for MultiAttach on a single node, but as far as I know this is not yet exposed in Kubernetes, so you can't really use it.

immanuelfodor commented 3 years ago

Thank you very much for explaining how it works! I was not aware only RWO is supported, I'll recreate the PVCs as such just to make them future-proof, and start using the remove flag on the Piraeus PVC consumer pods to make use of the HA controller.