kserve / modelmesh-serving

Controller for ModelMesh
Apache License 2.0
202 stars 114 forks source link

Failed to load model while following the tutorial 'Creating a custom serving runtime in KServe ModelMesh' #504

Open JimBeam2019 opened 5 months ago

JimBeam2019 commented 5 months ago

Describe the bug

While following the tutorial 'Creating a custom serving runtime in KServe ModelMesh' from the IBM site, I was trying to make a small adjustment, loading the sklearn mnist-svm.joblib model from the localMinIO instead. However, it failed to load the model and returned the error message MLServer Adapter.MLServer Adapter Server.LoadModel MLServer failed to load model {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}.

I was wondering if it is a bug or if there are any mistakes that I have made on the configuration. Please give me any advice and let me know if any further details you need. Really appreciated.

To Reproduce Steps to reproduce the behavior:

  1. Install ModelMesh Serving in the local minikube following the instruction
  2. Create Custom ML Model, code as below.
    
    from mlserver.model import MLModel
    from mlserver.utils import get_model_uri
    from mlserver.errors import InferenceError
    from mlserver.codecs import DecodedParameterName
    from mlserver.types import (
    InferenceRequest,
    InferenceResponse,
    ResponseOutput,
    )
    import logging
    from joblib import load
    import numpy as np

from os.path import exists

logging.basicConfig(level=logging.INFO) logger = logging.getLogger(name)

_to_exclude = { "parameters": {DecodedParameterName, "headers"}, 'inputs': {"all": {"parameters": {DecodedParameterName, "headers"}}} }

WELLKNOWN_MODEL_FILENAMES = ["mnist-svm.joblib"]

class CustomMLModel(MLModel):

async def load(self) -> bool: model_uri = await get_model_uri( self._settings, wellknown_filenames=WELLKNOWN_MODEL_FILENAMES ) logging.info("Model load URI: {model_uri}")

if exists(model_uri):
  logging.info(f"Loading MNIST model from {model_uri}")
  self._model = load(model_uri)
  logging.info("Model loaded successfully")
else:
  logging.info(f"Model not exist in {model_uri}")
  self.ready = False
  return self.ready

self.ready = True
return self.ready

async def predict(self, payload: InferenceRequest) -> InferenceResponse: input_data = [input_data.data for input_data in payload.inputs] input_name = [input_data.name for input_data in payload.inputs] input_data_array = np.array(input_data) result = self._model.predict(input_data_array) predictions = np.array(result)

logger.info(f"Predict result is: {result}")
return InferenceResponse(
    id=payload.id,
    model_name = self.name,
    model_version = self.version,
    outputs = [
        ResponseOutput(
            name = str(input_name[0]),
            shape = predictions.shape,
            datatype = "INT64",
            data=predictions.tolist(),
        )
    ],
)   
3. Build a docker image with the Dockfile below, named _dev.local/xgb-model:dev.2405042123_
```dockerfile
FROM python:3.9.13

RUN pip3 install --no-cache-dir mlserver==1.3.2 scikit-learn==1.4.0 joblib==1.3.2

COPY --chown=${USER} ./custom_model.py /opt/custom_model.py
ENV PYTHONPATH=/opt/
WORKDIR /opt

ENV MLSERVER_MODELS_DIR=/models/_mlserver_models \
    MLSERVER_GRPC_PORT=8001 \
    MLSERVER_HTTP_PORT=8002 \
    MLSERVER_METRICS_PORT=8082 \
    MLSERVER_LOAD_MODELS_AT_STARTUP=false \
    MLSERVER_DEBUG=false \
    MLSERVER_PARALLEL_WORKERS=1 \
    MLSERVER_GRPC_MAX_MESSAGE_LENGTH=33554432 \
    # https://github.com/SeldonIO/MLServer/pull/748
    MLSERVER__CUSTOM_GRPC_SERVER_SETTINGS='{"grpc.max_metadata_size": "32768"}' \
    MLSERVER_MODEL_NAME=dummy-model

ENV MLSERVER_MODEL_IMPLEMENTATION=custom_model.CustomMLModel

CMD ["mlserver", "start", "${MLSERVER_MODELS_DIR}"]
  1. Create a serving runtime with the yaml file below
    apiVersion: serving.kserve.io/v1alpha1
    kind: ServingRuntime
    metadata:
    name: custom-runtime-0.x
    spec:
    supportedModelFormats:
    - name: custom-model
      version: "1"
      autoSelect: true
    protocolVersions:
    - grpc-v2
    multiModel: true
    grpcDataEndpoint: port:8001
    grpcEndpoint: port:8085
    containers:
    - name: mlserver
      image: dev.local/xgb-model:dev.2405042123
      imagePullPolicy: IfNotPresent
      env:
        - name: MLSERVER_MODELS_DIR
          value: "/models/_mlserver_models/"
        - name: MLSERVER_GRPC_PORT
          value: "8001"
        - name: MLSERVER_HTTP_PORT
          value: "8002"
        - name: MLSERVER_LOAD_MODELS_AT_STARTUP
          value: "false"
        - name: MLSERVER_MODEL_NAME
          value: dummy-model
        - name: MLSERVER_HOST
          value: "127.0.0.1"
        - name: MLSERVER_GRPC_MAX_MESSAGE_LENGTH
          value: "-1"
        - name: MLSERVER_MODEL_IMPLEMENTATION
          value: "custom_model.CustomMLModel"
        - name: MLSERVER_DEBUG
          value: "true"
        - name: MLSERVER_MODEL_PARALLEL_WORKERS
          value: "0"
      resources:
        requests:
          cpu: "1"
          memory: "1Gi"
        limits:
          cpu: "2"
          memory: "1Gi"
    builtInAdapter:
    serverType: mlserver
    runtimeManagementPort: 8001
    memBufferBytes: 134217728
    modelLoadingTimeoutMillis: 90000
  2. Create an inference service with the yaml file below
    apiVersion: serving.kserve.io/v1beta1
    kind: InferenceService
    metadata:
    name: minio-model-isvc
    annotations:
    serving.kserve.io/deploymentMode: ModelMesh
    spec:
    predictor:
    model:
      modelFormat:
        name: custom-model
      runtime: custom-runtime-0.x
      storage:
        key: localMinIO
        path: sklearn/mnist-svm.joblib
  3. After the modelmesh pods start running, open the logs of mlserver-adapter.

Expected behavior

Usually, it should have loaded the model successfully.

Screenshots

However, it shows the logs as below.

2024-05-04T14:00:34Z    INFO    MLServer Adapter        Starting MLServer Adapter       {"adapter_config": {"Port":8085,"MLServerPort":8001,"MLServerContainerMemReqBytes":1073741824,"MLServerMemBufferBytes":134217728,"CapacityInBytes":939524096,"MaxLoadingConcurrency":1,"ModelLoadingTimeoutMS":90000,"DefaultModelSizeInBytes":1000000,"ModelSizeMultiplier":1.25,"RuntimeVersion":"dev.2405042123","LimitModelConcurrency":0,"RootModelDir":"/models/_mlserver_models","UseEmbeddedPuller":true}}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        Created root MLServer model directory   {"path": "/models/_mlserver_models"}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        Connecting to MLServer...       {"port": 8001}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        Initializing Puller     {"Dir": "/models"}
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server        MLServer runtime adapter started
2024-05-04T14:00:34Z    INFO    MLServer Adapter.MLServer Adapter Server.client-cache   starting clean up of cached clients
2024-05-04T14:00:34Z    INFO    MLServer Adapter        Adapter will run at port        {"port": 8085, "MLServer port": 8001}
2024-05-04T14:00:34Z    INFO    MLServer Adapter        Adapter gRPC Server registered, now serving
2024-05-04T14:00:44Z    INFO    MLServer Adapter.MLServer Adapter Server        Using runtime version returned by MLServer      {"version": "1.3.2"}
2024-05-04T14:00:44Z    INFO    MLServer Adapter.MLServer Adapter Server        runtimeStatus   {"Status": "status:READY capacityInBytes:939524096 maxLoadingConcurrency:1 modelLoadingTimeoutMs:90000 defaultModelSizeInBytes:1000000 runtimeVersion:\"1.3.2\" methodInfos:{key:\"inference.GRPCInferenceService/ModelInfer\" value:{idInjectionPath:1}} methodInfos:{key:\"inference.GRPCInferenceService/ModelMetadata\" value:{idInjectionPath:1}}"}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server.LoadModel      Model details   {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "modelType": "custom-model", "modelPath": "sklearn/mnist-svm.joblib"}
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        Reading storage credentials
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        creating new repository client  {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e"}
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        found objects to download       {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e", "path": "sklearn/mnist-svm.joblib", "count": 1}
2024-05-04T14:00:52Z    DEBUG   MLServer Adapter.MLServer Adapter Server        downloading object      {"type": "s3", "cacheKey": "s3|0x33b60418eef4115e", "path": "sklearn/mnist-svm.joblib", "filename": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib"}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server        Calculated disk size    {"modelFullPath": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "disk_size": 344817}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server.LoadModel      Generated model settings file   {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "schemaPath": "", "implementation": ""}
2024-05-04T14:00:52Z    INFO    MLServer Adapter.MLServer Adapter Server.LoadModel      Adapted model directory for standalone file/dir {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "sourcePath": "/models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "isDir": false, "symLinkPath": "/models/_mlserver_models/multi-model-isvc__isvc-1ee2e56a33/mnist-svm.joblib", "generatedSettingsFile": "/models/_mlserver_models/multi-model-isvc__isvc-1ee2e56a33/model-settings.json"}
2024-05-04T14:00:52Z    ERROR   MLServer Adapter.MLServer Adapter Server.LoadModel      MLServer failed to load model   {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}
github.com/kserve/modelmesh-runtime-adapter/model-mesh-mlserver-adapter/server.(*MLServerAdapterServer).LoadModel
        /opt/app/model-mesh-mlserver-adapter/server/server.go:137
github.com/kserve/modelmesh-runtime-adapter/internal/proto/mmesh._ModelRuntime_LoadModel_Handler
        /opt/app/internal/proto/mmesh/model-runtime_grpc.pb.go:206
google.golang.org/grpc.(*Server).processUnaryRPC
        /root/go/pkg/mod/google.golang.org/grpc@v1.56.3/server.go:1335
google.golang.org/grpc.(*Server).handleStream
        /root/go/pkg/mod/google.golang.org/grpc@v1.56.3/server.go:1712
google.golang.org/grpc.(*Server).serveStreams.func1.1
        /root/go/pkg/mod/google.golang.org/grpc@v1.56.3/server.go:947
2024-05-04T14:00:53Z    INFO    MLServer Adapter.MLServer Adapter Server.UnloadModel    Unload request for model not found in MLServer  {"modelId": "multi-model-isvc__isvc-1ee2e56a33", "error": "rpc error: code = NotFound desc = Model multi-model-isvc__isvc-1ee2e56a33 not found"}

Environment (please complete the following information):

Additional context

ckadner commented 4 months ago

I was trying to make a small adjustment, loading the sklearn mnist-svm.joblib model from the localMinIO instead.

Did the tutorial or example work without making changes?

@rafvasq -- can you spot something obvious? I would have to go through your tutorial myself and debug 😊