Auto cache when new node added to cluster

maacarbo commented 1 year ago

In AWS EKS, we intensively use auto scaling clusters. It would be handy if the controller knows when a new node is spin up and directly starts to cache the images.

leonidkhelemes commented 1 year ago

+1

elocke commented 1 year ago

+1 I kindof expected this already happened.

jaihwan104 commented 1 year ago

+1

djmcgreal-cc commented 1 year ago

My exact question, the top issue in the list!

This is likely a major use case in Machine Learning where a) GPUs are more expensive so typically scale often and b) images are large.

In this auto-scale-up case, Pods are waiting to be scheduled immediately so will probably not be able to take advantage of the kube-fledged cache refresh to load images into the new node (which I assume at least works?). Perhaps kube-fledged could be configured to manage a taint on newly provisioned nodes that's removed when images have been loaded from the cache. In cluster-autoscaler, taints can be prefixed with ignore-taint.cluster-autoscaler.kubernetes.io/ so they do not effect auto scaling groups selection.

senthilrch / kube-fledged

Auto cache when new node added to cluster #213