kserve / modelmesh-serving

Controller for ModelMesh
Apache License 2.0
204 stars 114 forks source link

Specify model size in the InferenceService CRD #392

Open andreapairon opened 1 year ago

andreapairon commented 1 year ago

Would be nice having a new parameter in the InferenceService CRD that allows user to specify the model size (the size in bytes), avoiding the MODEL_MULTIPLIER factor to estimate the size.

Is your feature request related to a problem? If so, please describe. The heuristic used to calculate the model size (model size on disk * MODEL_MULTIPLIER) is not always accurate because the amount of memory used by a model on a GPU could be greater and sometimes it could be possible to face OOM errors. Due to this problem the number of total models that can stay loaded on the GPU is not estimated correctly.

We already faced this issues using Triton as serving runtime.

Describe your proposed solution New parameter in the InferenceService CRD that allows user to specify the model size, avoiding the MODEL_MULTIPLIER factor to estimate the size.

mafs12 commented 3 weeks ago

I'm using Triton's Python backend to load models from Hugging Face Hub. The disk size of config.pbtxt and model.py is around 12K, so MODEL_MULTIPLIER ends up being the average model size, which can vary from 5G to 25G! This difference impacts model placement decisions. We really need a better way to estimate/set model size.