Triton model inference with yolov8 onnx

/kind bug

What steps did you take and what happened: Basically I saved my yolov8 model as an Onnx model and I tried to follow the documentation for serving. When I'm trying to do simple inference with internal & external url of the isvc, I get this error InferenceServerException: [400] Request for unknown model: 'v2' is not found

What did you expect to happen: Be able to do inference wiht the triton predictor

What's the InferenceService yaml:

apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"serving.kserve.io/v1beta1","kind":"InferenceService","metadata":{"annotations":{},"name":"torchscript-yoleokcaldin8","namespace":"admin"},"spec":{"predictor":{"serviceAccountName":"sa-minio-kserve","triton":{"runtimeVersion":"21.08-py3","storageUri":"s3://aladin/onnx"}}}}
  creationTimestamp: "2024-02-29T09:19:23Z"
  finalizers:
  - inferenceservice.finalizers
  generation: 5
  labels:
    serviceEnvelope: kserve
  name: torchscript-yoleokcaldin8
  namespace: admin
  resourceVersion: "44968177525"
  uid: face3fdc-3590-411a-99b7-20a0a61fa4ba
spec:
  predictor:
    model:
      modelFormat:
        name: triton
      name: ""
      resources: {}
      runtime: kserve-tritonserver
      runtimeVersion: 21.08-py3
      storageUri: s3://aladin/onnx
    serviceAccountName: sa-minio-kserve
status:
  address:
    url: http://torchscript-yoleokcaldin8.admin.svc.cluster.local/v2/models/torchscript-yoleokcaldin8/infer
  components:
    predictor:
      address:
        url: http://torchscript-yoleokcaldin8-predictor-default.admin.svc.cluster.local/
      latestCreatedRevision: torchscript-yoleokcaldin8-predictor-default-00005
      latestReadyRevision: torchscript-yoleokcaldin8-predictor-default-00005
      latestRolledoutRevision: torchscript-yoleokcaldin8-predictor-default-00005
      previousRolledoutRevision: torchscript-yoleokcaldin8-predictor-default-00001
      traffic:
      - latestRevision: true
        percent: 100
        revisionName: torchscript-yoleokcaldin8-predictor-default-00005
      url: http://external_url/
  conditions:
  - lastTransitionTime: "2024-03-07T09:24:01Z"
    status: "True"
    type: IngressReady
  - lastTransitionTime: "2024-03-07T09:24:00Z"
    severity: Info
    status: "True"
    type: PredictorConfigurationReady
  - lastTransitionTime: "2024-03-07T09:24:01Z"
    status: "True"
    type: PredictorReady
  - lastTransitionTime: "2024-03-07T09:24:01Z"
    severity: Info
    status: "True"
    type: PredictorRouteReady
  - lastTransitionTime: "2024-03-07T09:24:01Z"
    status: "True"
    type: Ready
  url: http://external_url/

Anything else you would like to add:

Checking the logs, I can verify that the onnx model was copied to my mnt/models path and ready on the kserve panel in kubeflow but while looking at the kserve container logs I can't see my model status.

+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I0312 09:18:11.466345 1 tritonserver.cc:1718] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.13.0                                                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /mnt/models                                                                                                                                                                            |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 1                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0312 09:18:11.470316 1 grpc_server.cc:4111] Started GRPCInferenceService at 0.0.0.0:9000
I0312 09:18:11.470803 1 http_server.cc:2803] Started HTTPService at 0.0.0.0:8080
I0312 09:18:11.513379 1 http_server.cc:162] Started Metrics Service at 0.0.0.0:8002

Environment:

Istio Version: 1.6
Knative Version: 1.6
KServe Version: 0.8
Kubeflow version: 1.6.1
Cloud Environment:[k8s_istio/istio_dex/gcp_basic_auth/gcp_iap/aws/aws_cognito/ibm]
Minikube/Kind version:
Kubernetes version: (use kubectl version): 1.27
OS (e.g. from /etc/os-release):

kserve / kserve

Triton model inference with yolov8 onnx #3513