senthilrch / kube-fledged

A kubernetes operator for creating and managing a cache of container images directly on the cluster worker nodes, so application pods start almost instantly
Apache License 2.0
1.24k stars 118 forks source link

Feature: Throttling job creation #161

Open 1337andre opened 2 years ago

1337andre commented 2 years ago

We have a medium cluster with 20 nodes. When we try to cache e.g. CRD with 50 images kube-fledged will span 1000 jobs to pull that images. We can see that API server get in trouble and some workloads e.g. redis will have problems to get HA redis-cluster working. Does anyone have similar problems? Is it possible to throttling job creation?

bkupidura commented 2 years ago

I observed same behavior in my home lab. When bumping version of all my images (~21 managed by CRD), my kube api starts beeing unresponsible.

It would be nice if we can limit how many images can be downloaded in same time.

senthilrch commented 2 years ago

This is a much needed feature. Agreed.

linuraj commented 2 years ago

we tried throttling pod counts via ResourceQuota in the kube-fledged namespace. That didn't help either. Appreciate your help!

senthilrch commented 1 year ago

I've implented a solution for throttling jobs created by kube-fledged. This will be delivered in v0.11.0 release

aledeulo commented 1 year ago

Hi, I would like to ask when you've planned to release the v0.11.0 version? Thank you very much.

thomson131 commented 7 months ago

@senthilrch , firstly thank you for your work on this project, looking forward to the next release.

For those using this project, and waiting for the next release to better handle job throttling... a potential workaround is to apply both a pod quota and count/job.batch quota to the namespace. The operator seems to handle the job creation gracefully, waiting for job resources to become available and working through the imageCache list until completion within the resource constraints.