kserve / modelmesh-serving

Controller for ModelMesh
Apache License 2.0
203 stars 114 forks source link

Standardize KServe multi-model management SPI and add built-in support #159

Closed njhill closed 8 months ago

njhill commented 2 years ago

For dynamic loading/unloading of models, Triton defines a "Model Repository" API which is described as an extension to the KServe v2 dataplane API.

This includes both REST and gRPC variants of the following API endpoints:

POST v2/repository/index
POST v2/repository/models/${MODEL_NAME}/load
POST v2/repository/models/${MODEL_NAME}/unload

MLServer followed this and have implemented the same API but unfortunately their gRPC service definition uses different service and packages name:

ModelMesh uses these in the built-in modelmesh support for Triton/MLServer to manage models in each Triton instance, but currently the logic is mostly specific to each because of the differing service names and different filesystem layout requirements. Note that only the load/unload methods are used, index isn't required.

It seems that this is an at least de facto standard KServe API for model management so it would make sense to support it as an option for other/custom model server implementations via our built-in adapter, as alternative to implementing the native model-mesh gRPC model runtime SPI.

First though we should decide on the official/standard package and service name to use for the gRPC service, and copy its specification into the KServe repo somewhere.

njhill commented 2 years ago

Looks like MLServer has now standardize on the Triton package/service names: https://github.com/SeldonIO/MLServer/pull/616 :tada:

rafvasq commented 8 months ago

Closed by https://github.com/kserve/modelmesh-runtime-adapter/pull/45.