[BUG]: Triton won't load log-parsing-onnx model

nvawood commented 4 months ago

Version

24.03

Which installation method(s) does this occur on?

Kubernetes

Describe the bug.

Deployed Morpheus via NGC using helm charts. Unable to deploy log-parsing-onnx model.

Minimum reproducible example

export API_KEY="<NGC KEY>"
export NAMESPACE="morpheus"
export RELEASE="testing"

helm fetch https://helm.ngc.nvidia.com/nvidia/morpheus/charts/morpheus-ai-engine-24.03.tgz --username='$oauthtoken' --password=${API_KEY} --untar
helm fetch https://helm.ngc.nvidia.com/nvidia/morpheus/charts/morpheus-mlflow-24.03.tgz --username='$oauthtoken' --password=${API_KEY} --untar
helm fetch https://helm.ngc.nvidia.com/nvidia/morpheus/charts/morpheus-sdk-client-24.03.tgz --username='$oauthtoken' --password=${API_KEY} --untar

helm install --set ngc.apiKey="${API_KEY}" --namespace "${NAMESPACE}" "${RELEASE}-engine" morpheus-ai-engine
helm install --set ngc.apiKey="${API_KEY}" --namespace "${NAMESPACE}" "${RELEASE}-helper" morpheus-sdk-client

(when Running)

kubectl -n "${NAMESPACE}" exec "sdk-cli-${RELEASE}-helper" -- cp -RL /workspace/models /common

helm install --set ngc.apiKey="${API_KEY}" --namespace "${NAMESPACE}" "${RELEASE}-mlflow" morpheus-mlflow

kubectl -n ${NAMESPACE} exec -it deploy/mlflow -- bash

python publish_model_to_mlflow.py \
      --model_name log-parsing-onnx \
      --model_directory /common/models/triton-model-repo/log-parsing-onnx \
      --flavor triton

mlflow deployments create -t triton \
      --flavor triton \
      --name log-parsing-onnx \
      -m models:/log-parsing-onnx/1 \
      -C "version=1"

Relevant log output

Triton Logs


I0604 16:12:32.064069 1 model_lifecycle.cc:469] loading: log-parsing-onnx:1
I0604 16:12:32.070497 1 onnxruntime.cc:2725] TRITONBACKEND_ModelInitialize: log-parsing-onnx (version 1)
I0604 16:12:32.071754 1 onnxruntime.cc:698] skipping model configuration auto-complete for 'log-parsing-onnx': inputs and outputs already specified
I0604 16:12:32.088522 1 onnxruntime.cc:2790] TRITONBACKEND_ModelInstanceInitialize: log-parsing-onnx_0 (GPU device 0)
2024-06-04 16:12:32.602428982 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-06-04 16:12:32.602460352 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
I0604 16:12:32.838257 1 onnxruntime.cc:2842] TRITONBACKEND_ModelInstanceFinalize: delete instance state
E0604 16:12:32.838421 1 backend_model.cc:691] ERROR: Failed to create instance: model 'log-parsing-onnx', tensor 'output': the model expects 3 dimensions (shape [-1,256,23]) but the model configuration specifies 3 dimensions (an initial batch dimension because max_batch_size > 0 followed by the explicit tensor shape, making complete shape [-1,-1,23])
I0604 16:12:32.838486 1 onnxruntime.cc:2766] TRITONBACKEND_ModelFinalize: delete model state
E0604 16:12:32.838550 1 model_lifecycle.cc:638] failed to load 'log-parsing-onnx' version 1: Invalid argument: model 'log-parsing-onnx', tensor 'output': the model expects 3 dimensions (shape [-1,256,23]) but the model configuration specifies 3 dimensions (an initial batch dimension because max_batch_size > 0 followed by the explicit tensor shape, making complete shape [-1,-1,23])
I0604 16:12:32.838589 1 model_lifecycle.cc:773] failed to load 'log-parsing-onnx'

Deployment Creation Logs


Successfully registered model 'log-parsing-onnx'.
Created version '1' of model 'log-parsing-onnx'.
/mlflow/artifacts/0/7d4f5b3ad12e4e228e267ecfc41b9421/artifacts
Saved mlflow-meta.json to /common/triton-model-repo/log-parsing-onnx
Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/mlflow_triton/deployments.py", line 115, in create_deployment
    self.triton_client.load_model(name)
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/tritonclient/http/_client.py", line 669, in load_model
    _raise_if_error(response)
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/tritonclient/http/_utils.py", line 69, in _raise_if_error
    raise error
tritonclient.utils.InferenceServerException: [400] load failed for model 'log-parsing-onnx': version 1 is at UNAVAILABLE state: Invalid argument: model 'log-parsing-onnx', tensor 'output': the model expects 3 dimensions (shape [-1,256,23]) but the model configuration specifies 3 dimensions (an initial batch dimension because max_batch_size > 0 followed by the explicit tensor shape, making complete shape [-1,-1,23]);

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/envs/mlflow/bin/mlflow", line 8, in 
    sys.exit(cli())
             ^^^^^
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/mlflow/deployments/cli.py", line 151, in create_deployment
    deployment = client.create_deployment(name, model_uri, flavor, config=config_dict)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/mlflow/lib/python3.11/site-packages/mlflow_triton/deployments.py", line 117, in create_deployment
    raise MlflowException(str(ex))
mlflow.exceptions.MlflowException: [400] load failed for model 'log-parsing-onnx': version 1 is at UNAVAILABLE state: Invalid argument: model 'log-parsing-onnx', tensor 'output': the model expects 3 dimensions (shape [-1,256,23]) but the model configuration specifies 3 dimensions (an initial batch dimension because max_batch_size > 0 followed by the explicit tensor shape, making complete shape [-1,-1,23]);

Full env printout

No response

Other/Misc.

No response

Code of Conduct

[X] I agree to follow Morpheus' Code of Conduct
[X] I have searched the open bugs and have found no duplicates for this bug report

efajardo-nv commented 3 months ago

Hi @nvawood. I'm not able to reproduce this using the model from the repo. For me, Triton correctly detects the model with shape [-1,-1,23] which matches the model config. Not sure how Triton is seeing [-1,256,23] for you. Could you confirm that you're trying to deploy the log-parsing-onnx model from the repo?

efajardo-nv commented 3 months ago

I was able to reproduce the error after updating to Triton 24.03. I was using 23.06 before. I had Triton auto-generate a new model config. The output config needs to look like this now (dims was updated):

output [
    {
        name: "output"
        data_type: TYPE_FP32
        dims: [ 256, 23 ]
    }
]

This update however causes model to not load in Triton 23.06.

nv-morpheus / Morpheus