senthilrch / kube-fledged

A kubernetes operator for creating and managing a cache of container images directly on the cluster worker nodes, so application pods start almost instantly
Apache License 2.0
1.26k stars 119 forks source link

Auto cache when new node added to cluster #213

Open maacarbo opened 1 year ago

maacarbo commented 1 year ago

In AWS EKS, we intensively use auto scaling clusters. It would be handy if the controller knows when a new node is spin up and directly starts to cache the images.

leonidkhelemes commented 1 year ago

+1

elocke commented 1 year ago

+1 I kindof expected this already happened.

jaihwan104 commented 1 year ago

+1

djmcgreal-cc commented 1 year ago

My exact question, the top issue in the list!

This is likely a major use case in Machine Learning where a) GPUs are more expensive so typically scale often and b) images are large.

In this auto-scale-up case, Pods are waiting to be scheduled immediately so will probably not be able to take advantage of the kube-fledged cache refresh to load images into the new node (which I assume at least works?). Perhaps kube-fledged could be configured to manage a taint on newly provisioned nodes that's removed when images have been loaded from the cache. In cluster-autoscaler, taints can be prefixed with ignore-taint.cluster-autoscaler.kubernetes.io/ so they do not effect auto scaling groups selection.