livekit / livekit-helm

LiveKit Helm charts
https://docs.livekit.io
Apache License 2.0
50 stars 59 forks source link

Add value option to choose DaemonSet instead of Deployment #86

Closed Stogas closed 9 months ago

Stogas commented 10 months ago

For example, we have a case where we add dedicated Kubernetes Worker nodes exclusively to run Livekit pods (as you know, Livekit can require a lot of CPU and network resources). In that case, we simply add specific Node Labels to those nodes, and then we can deploy this chart using deploymentType: DaemonSet and applicable nodeSelector values to deploy Livekit automatically for each new Kubernetes node with those labels.

CLAassistant commented 10 months ago

CLA assistant check
All committers have signed the CLA.

real-danm commented 9 months ago

I don't think using a Daemonset rather than a Deployment is a feature that we'd want to support due to the different expectations of these two kubernetes types such as logging, resources, and update strategies. A better way to do this is to continue using the node selector and then leverage the autoscaling section of the helm chart https://github.com/livekit/livekit-helm/blob/master/livekit-server/values.yaml#L71-L76

If your node pool is static, you can just scale the deployment up as you add nodes, or if it is autoscaling the deployment can manage this based on cpu/mem/etc.

Stogas commented 9 months ago

different expectations of these two kubernetes types such as logging, resources, and update strategies.

Would you mind elaborating a bit on this? We're currently running this in production and are not observing any irregularities with logging, resource allocation or rollout strategies.

A better way to do this is to continue using the node selector and then leverage the autoscaling section of the helm chart https://github.com/livekit/livekit-helm/blob/master/livekit-server/values.yaml#L71-L76 If your node pool is static, you can just scale the deployment up as you add nodes, or if it is autoscaling the deployment can manage this based on cpu/mem/etc.

If a user is already using a NodeSelector for Livekit, utilizing a DaemonSet results in a less complex (or rather, one with less steps) scaling setup (Node autoscaling only) compared to using a Deployment (Node autoscaling + Deployment pod autoscaling).

https://docs.livekit.io/realtime/self-hosting/kubernetes/

With that direct requirement of specific ports, it means we'll be limited to one LiveKit pod per node.

As DaemonSets are the exact type of workload designed to enforce a limit of 1 pod per node, I'd argue DaemonSets are a much closer native Kubernetes primite to enforce that deployment need.

Either way, I urge you to reconsider supporting DaemonSets, even if with a note that "DaemonSet usage is not tested extensively by Livekit inc.".

Additionally, we'll make sure to report any feedback and provide necessary PRs for any bugs or misbehavior we observe from already running Livekit in production using a DaemonSet.

real-danm commented 9 months ago

daemonset is a non-standard way to run something like livekit and would require manual scaling. it also ignores all the docs and work that have been created to ensure pod autoscaling works. Newer users of this helm chart could easily become confused on which to use and lead to problems down the road when things aren't working as expected.

If running as a daemonset is working for you, great! but wanting to keep this repo pretty clean to ensure clarity and parity with current documentation

Stogas commented 9 months ago

I appreciate you taking the time to discuss this, I really do.

However, aside from DaemonSets not being a documented way of deploying Livekit in Kubernetes yet, I have yet to find any concrete evidence as to why it is not possible or would result in a worse deployment scenario.

.

daemonset is a non-standard way to run something like livekit

How so? Again, requiring to run a maximum of one pod per node seems exactly what DaemonSets were meant to do. "Non-standard" here is correct merely because DaemonSets are not implemented in this Helm chart, not because it's somehow a worse deployment method, IMO.

would require manual scaling

It can be fully auto-scaled using just horizontal node autoscaling (which is also needed for full autoscaling via Deployments anyways, if you don't want an upper limit for scale).

ignores all the docs and work that have been created to ensure pod autoscaling works

Providing an alternative way to deploy does not invalidate the work already done on the "main" method of deployment. It's a non-zero sum scenario, IMO.

Newer users of this helm chart could easily become confused on which to use <...> ensure clarity and parity with current documentation

This seems easily solved by adding information to the documentation. I can offer help in writing in the additional details as needed, if that makes it easier.

.

I respect Livekit inc's decision to not support this in the official Helm chart, but I'd love a more concrete explanation as to why it's being rejected.

Thank you!

Stogas commented 9 months ago

Is there any possibility of a detailed response to my questions?

davidzhao commented 9 months ago

@Stogas, we appreciate your effort in trying to improve the helm chart. However, we do not want to create paths for deployment as it increases the support burden.

when someone is having issues with daemonset deployment, we will not be able to offer them any help at all.