Inquiry for use-case when nodepool is also scaled to zero

sablierapp / sablier

Start your containers on demand, shut them down automatically when there's no activity. Docker, Docker Swarm Mode and Kubernetes compatible.

https://sablierapp.dev/

GNU Affero General Public License v3.0

1.36k stars 46 forks source link

Inquiry for use-case when nodepool is also scaled to zero #61

Closed christidis closed 2 years ago

christidis commented 2 years ago

I need to deploy a GPU service in GKE with Traefik which will respond to HTTP processing requests (as an API not for users) and most time I need this scaled to zero. The nodepool will be also configured to scale down to zero when there are no pods deployed in it.

When there is HTTP traffic I'd expect Sablier to intercept the requests and deploy the pods. Then GKE should be responsible to provision the nodes and download the images in order to deploy the services. The whole cold-start time may take up to 5 minutes.

When there is no traffic, Sablier should scale the application down to zero and GKE should scale the nodepool down to zero.

Is Sablier suitable for this use-case?

Also, one thing I couldn't figure out from the docs, when there is an HTTP request how many replicas are being deployed? Does Sablier provide a queueing system to monitor HTTP load and provide dynamic scaling or it just deployes the configured number of replicas eg 10 replicas?

Last but not least is there a minimum Traefik version in order to use Sablier as a plugin with it?

acouvreur commented 2 years ago

Hi,

I think it matches your use case, however the node pool won't be scaled up with Sablier. Because I believe it does run inside your GKE NodePool right? So it depends on how the cold start is triggered in the first place.

I recommend you to watch the beta branch for the next few weeks as I will update a lot of things.

I already added some E2E tests that can help you understand the different usages of the plugin.

I'm writing a lot of documentation right now :)

acouvreur commented 2 years ago

Also, one thing I couldn't figure out from the docs, when there is an HTTP request how many replicas are being deployed? Does Sablier provide a queueing system to monitor HTTP load and provide dynamic scaling or it just deployes the configured number of replicas eg 10 replicas?

https://github.com/acouvreur/sablier/blob/beta/KUBERNETES.md#creating-a-middleware which states

The format of the name: section is <KIND>_<NAMESPACE>_<NAME>_<REPLICACOUNT> where _ is the delimiter.

christidis commented 2 years ago

Traefik Ingress and Sablier won't be installed to the same nodepool with the application.

In theory this is what I'd expect.

If Sablier scales the Deployment to 0 then eventually Kubernetes will scale the nodepool down to zero. When an HTTP request comes and Sablier spawns the pod(s) it will be Kubernetes responsibility to deploy the nodepool first. Pods will be in a ContainerCreating state until this is completed and then they will be assigned to the node.

The reason why I am exploring this setup is because GPU enabled nodes can be pricey and Serverless services do not support GPUs (not to mention that the k8s ecosystem works for my team). So this seems like a promising solution I'd like to evaluate (along with Keda & Keda-http).

OK great, thanks for your feedback. I think I can close this inquiry and will keep an eye on the beta branch for the next couple of weeks and also give it a try at some point!

Cheers!

christidis commented 1 year ago

This is to confirm that everything works as expected.

I am deploying a GPU-enabled deployment in GKE with replicas set to 0 by default. Upon first request Sablier scales up the deployment to the configured number of replicas. Then it is Kubernetes/GKEs responsibility to trigger nodepool scale up. When the nodepool becomes available, the pods are deployed and then everything is set.

On the scale-down side, when there is an inactivity for the configured number of period, Sablier scales down to 0 the deployment. For the nodepool, again, it is GKE's responsibility and the way that you have configured your nodepool to scale it to zero.

Of course there is a considerable cold start waiting time when you have your infrastructure scaled down to zero but if your aim is to reduce costs and your wokrflow can afford that then you should be ok.

acouvreur commented 1 year ago

This is to confirm that everything works as expected.

I am deploying a GPU-enabled deployment in GKE with replicas set to 0 by default. Upon first request Sablier scales up the deployment to the configured number of replicas. Then it is Kubernetes/GKEs responsibility to trigger nodepool scale up. When the nodepool becomes available, the pods are deployed and then everything is set.

On the scale-down side, when there is an inactivity for the configured number of period, Sablier scales down to 0 the deployment. For the nodepool, again, it is GKE's responsibility and the way that you have configured your nodepool to scale it to zero.

Of course there is a considerable cold start waiting time when you have your infrastructure scaled down to zero but if your aim is to reduce costs and your wokrflow can afford that then you should be ok.

Thanks for the feedback! I really appreciate it! I'll probably add some info or a guide related to your use case!