Closed sidharthkumarpradhan closed 7 months ago
Hi @sidharthkumarpradhan ... model-mesh does not work with KNative, regular kube deployments are created/managed by the modelmesh-serving controller.
Because it was designed to manage large numbers of smallish models, the autoscaling happens by loading/unloading copies of models within a static set of pods. You can configure how many pods (per runtime), but it's not dynamic currently. It will scale them to zero however if you don't have any InferenceServices created that need the particular runtime.
So currently it is not compatible with HPA .. the HPA would fight the mm-serving controller to set the replica count. There are plans to add a config option to make this possible soon however, see https://github.com/kserve/modelmesh-serving/issues/329
Thanks Njhill for you valuable inputs, could you kindly tell us when the "HPA" will be available? It will be really helpful for our case. In the mean time could you kindly give us a solution for the Scaling, if it is possible by any means. Thank you.
Issue-1: We are trying to to autoscale the custom deployed runtime,, we have tried to specify the annotation and predictor parameters in Inference service manifest, but the scaling is not happening
example:
Issue-2:
Then we have tried to scale the custom runtime deployment by creating a HPA(Horizontal Pod Autoscaler), but though the Runtime pods are getting spun up , we are not able to distribute load across all the pods (Load Balancing is not happening), and pods are getting terminated as soon as it's up. Below is the HPA manifest that we are using.
Kindly help us to figure out an Autoscalin solution for Custom Runtime Model Mesh. Thank you.