Specify model size in the InferenceService CRD

Would be nice having a new parameter in the InferenceService CRD that allows user to specify the model size (the size in bytes), avoiding the MODEL_MULTIPLIER factor to estimate the size.

Is your feature request related to a problem? If so, please describe. The heuristic used to calculate the model size (model size on disk * MODEL_MULTIPLIER) is not always accurate because the amount of memory used by a model on a GPU could be greater and sometimes it could be possible to face OOM errors. Due to this problem the number of total models that can stay loaded on the GPU is not estimated correctly.

We already faced this issues using Triton as serving runtime.

Describe your proposed solution New parameter in the InferenceService CRD that allows user to specify the model size, avoiding the MODEL_MULTIPLIER factor to estimate the size.

kserve / modelmesh-serving

Specify model size in the InferenceService CRD #392