The Python-Based Custom Runtime with MLServer can not support to deploy a model stored on a Persistent Volume Claim

Describe the bug I store the trained model (I use mnist-svm.joblib in my case) on a PVC, and I have some extra logics to handle the trained model after it is loaded. Therefore, I need to write a custom ServingRuntime to handle it.

To Reproduce It works well when I follow the doc: Deploy a model stored on a Persistent Volume Claim, I can see the model file mnist-svm.joblib and model-settings.json under the folder of /models/_mlserver_models/, showing as below:

However, I want to write a custom ServingRuntime, so I go on to follow the doc: Python-Based Custom Runtime with MLServer, then create a new ServingRuntime and create an InferenceService for this ServingRuntime. After all these actions, everything is okay, except the inference service can not be True due to "NOT_FOUND" error in inference service, showing below:

Based on the comparison, I think there should be something wrong with the Python-Based Custom Runtime with MLServer when using Persistent Volume Claim to store a trained model.

Expected behavior

Hope that there is an explicit demo to show how to support the use case of python-based custom runtime with MLServer with a model stored on a Persistent Volume Claim. Thanks a lot if it's possible!

kserve / modelmesh-serving

The Python-Based Custom Runtime with MLServer can not support to deploy a model stored on a Persistent Volume Claim #494