triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.21k stars 1.47k forks source link

Model unloaded after successful load #6413

Closed tomaszstachera closed 1 year ago

tomaszstachera commented 1 year ago

Description After successful load of a model it is unloaded without explanation why. Logs:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 21.08 (build 26170506)

Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
find: '/usr/lib/ssl/private': Permission denied

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use Docker with NVIDIA Container Toolkit to start this container; see
   https://github.com/NVIDIA/nvidia-docker.

I1011 08:02:58.581948 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/pytorch/libtriton_pytorch.so
I1011 08:02:58.871010 1 libtorch.cc:1029] TRITONBACKEND_Initialize: pytorch
I1011 08:02:58.871036 1 libtorch.cc:1039] Triton TRITONBACKEND API version: 1.4
I1011 08:02:58.871041 1 libtorch.cc:1045] 'pytorch' TRITONBACKEND API version: 1.4
I1011 08:02:58.871104 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so
2023-10-11 08:02:59.020382: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
I1011 08:02:59.061321 1 tensorflow.cc:2169] TRITONBACKEND_Initialize: tensorflow
I1011 08:02:59.061353 1 tensorflow.cc:2179] Triton TRITONBACKEND API version: 1.4
I1011 08:02:59.061361 1 tensorflow.cc:2185] 'tensorflow' TRITONBACKEND API version: 1.4
I1011 08:02:59.061368 1 tensorflow.cc:2209] backend configuration:
{}
I1011 08:02:59.061417 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so
I1011 08:02:59.062822 1 onnxruntime.cc:1970] TRITONBACKEND_Initialize: onnxruntime
I1011 08:02:59.062848 1 onnxruntime.cc:1980] Triton TRITONBACKEND API version: 1.4
I1011 08:02:59.062854 1 onnxruntime.cc:1986] 'onnxruntime' TRITONBACKEND API version: 1.4
I1011 08:02:59.072355 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/openvino/libtriton_openvino.so
I1011 08:02:59.083279 1 openvino.cc:1193] TRITONBACKEND_Initialize: openvino
I1011 08:02:59.083299 1 openvino.cc:1203] Triton TRITONBACKEND API version: 1.4
I1011 08:02:59.083306 1 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.4
W1011 08:02:59.083418 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I1011 08:02:59.083446 1 cuda_memory_manager.cc:115] CUDA memory pool disabled
I1011 08:02:59.083466 1 backend_factory.h:45] Create TritonBackendFactory
I1011 08:02:59.083481 1 plan_backend_factory.cc:49] Create PlanBackendFactory
I1011 08:02:59.083491 1 plan_backend_factory.cc:56] Registering TensorRT Plugins
I1011 08:02:59.083534 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1
I1011 08:02:59.083554 1 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1
I1011 08:02:59.083567 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1
I1011 08:02:59.083580 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1
I1011 08:02:59.083591 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1
I1011 08:02:59.083608 1 logging.cc:52] Registered plugin creator - ::Clip_TRT version 1
I1011 08:02:59.083623 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1
I1011 08:02:59.083633 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1
I1011 08:02:59.083650 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1
I1011 08:02:59.083661 1 logging.cc:52] Registered plugin creator - ::ScatterND version 1
I1011 08:02:59.083677 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1
I1011 08:02:59.083688 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1
I1011 08:02:59.083697 1 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1
I1011 08:02:59.083710 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1
I1011 08:02:59.083722 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1
I1011 08:02:59.083735 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1
I1011 08:02:59.083748 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_ONNX_TRT version 1
I1011 08:02:59.083761 1 logging.cc:52] Registered plugin creator - ::EfficientNMS_TRT version 1
I1011 08:02:59.083772 1 logging.cc:52] Registered plugin creator - ::Proposal version 1
I1011 08:02:59.083787 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1
I1011 08:02:59.083797 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1
I1011 08:02:59.083813 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1
I1011 08:02:59.083827 1 logging.cc:52] Registered plugin creator - ::Split version 1
I1011 08:02:59.083836 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1
I1011 08:02:59.083856 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1
I1011 08:02:59.083870 1 ensemble_backend_factory.cc:47] Create EnsembleBackendFactory
I1011 08:02:59.084011 1 autofill.cc:138] TensorFlow SavedModel autofill: Internal: unable to autofill for '1' due to no version directories
I1011 08:02:59.084034 1 autofill.cc:151] TensorFlow GraphDef autofill: Internal: unable to autofill for '1' due to no version directories
I1011 08:02:59.084056 1 autofill.cc:164] PyTorch autofill: Internal: unable to autofill for '1' due to no version directories
I1011 08:02:59.084083 1 autofill.cc:196] ONNX autofill: Internal: unable to autofill for '1' due to no version directories
I1011 08:02:59.084111 1 autofill.cc:209] TensorRT autofill: Internal: unable to autofill for '1' due to no version directories
W1011 08:02:59.084120 1 autofill.cc:243] Proceeding with simple config for now
I1011 08:02:59.084125 1 model_config_utils.cc:637] autofilled config: name: "1"

E1011 08:02:59.084688 1 model_repository_manager.cc:1919] Poll failed for model directory '1': unexpected platform type  for 1
I1011 08:02:59.084923 1 autofill.cc:138] TensorFlow SavedModel autofill: Internal: unable to autofill for 'hotspot-detect', unable to find savedmodel directory named 'model.savedmodel'
I1011 08:02:59.084995 1 autofill.cc:151] TensorFlow GraphDef autofill: Internal: unable to autofill for 'hotspot-detect', unable to find graphdef file named 'model.graphdef'
I1011 08:02:59.085043 1 autofill.cc:164] PyTorch autofill: Internal: unable to autofill for 'hotspot-detect', unable to find PyTorch file named 'model.pt'
I1011 08:02:59.085103 1 autofill.cc:196] ONNX autofill: Internal: unable to autofill for 'hotspot-detect', unable to find onnx file or directory named 'model.onnx'
W1011 08:02:59.085417 1 logging.cc:46] Unable to determine GPU memory usage
W1011 08:02:59.085465 1 logging.cc:46] Unable to determine GPU memory usage
I1011 08:02:59.085481 1 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 59, GPU 0 (MiB)
E1011 08:02:59.085540 1 logging.cc:40] [runtime.cpp::isCudaInstalledCorrectly::38] Error Code 6: Internal Error (CUDA initialization failure with error 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
W1011 08:02:59.085607 1 logging.cc:46] Unable to determine GPU memory usage
W1011 08:02:59.085657 1 logging.cc:46] Unable to determine GPU memory usage
I1011 08:02:59.085670 1 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 59, GPU 0 (MiB)
E1011 08:02:59.085694 1 logging.cc:40] [runtime.cpp::isCudaInstalledCorrectly::38] Error Code 6: Internal Error (CUDA initialization failure with error 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
W1011 08:02:59.085754 1 logging.cc:46] Unable to determine GPU memory usage
W1011 08:02:59.085798 1 logging.cc:46] Unable to determine GPU memory usage
I1011 08:02:59.085813 1 logging.cc:49] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 59, GPU 0 (MiB)
E1011 08:02:59.085837 1 logging.cc:40] [runtime.cpp::isCudaInstalledCorrectly::38] Error Code 6: Internal Error (CUDA initialization failure with error 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)
I1011 08:02:59.085850 1 autofill.cc:209] TensorRT autofill: Internal: unable to autofill for 'hotspot-detect', unable to find a compatible plan file.
W1011 08:02:59.085857 1 autofill.cc:243] Proceeding with simple config for now
I1011 08:02:59.085863 1 model_config_utils.cc:637] autofilled config: name: "hotspot-detect"
version_policy {
  all {
  }
}
max_batch_size: 1
input {
  name: "IMAGE"
  data_type: TYPE_UINT8
  dims: -1
  dims: -1
  dims: 3
}
input {
  name: "AREA_OF_INTEREST"
  data_type: TYPE_INT64
  dims: -1
  dims: 2
}
output {
  name: "AREA_OF_INTEREST_COUNT"
  data_type: TYPE_UINT32
  dims: 1
  dims: 1
}
output {
  name: "AREA_OF_INTEREST_MIN"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
}
output {
  name: "AREA_OF_INTEREST_MAX"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
}
output {
  name: "AREA_OF_INTEREST_MEAN"
  data_type: TYPE_FP32
  dims: 1
  dims: 1
}
parameters {
  key: "EXECUTION_ENV_PATH"
  value {
    string_value: "/mnt/models/HotspotDetectEnv.tar.gz"
  }
}
backend: "python"

I1011 08:02:59.085941 1 model_repository_manager.cc:749] AsyncLoad() 'hotspot-detect'
I1011 08:02:59.085995 1 model_repository_manager.cc:988] TriggerNextAction() 'hotspot-detect' version 1: 1
I1011 08:02:59.086010 1 model_repository_manager.cc:1026] Load() 'hotspot-detect' version 1
I1011 08:02:59.086016 1 model_repository_manager.cc:1045] loading: hotspot-detect:1
I1011 08:02:59.186263 1 model_repository_manager.cc:1105] CreateInferenceBackend() 'hotspot-detect' version 1
I1011 08:02:59.186383 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/python/libtriton_python.so
I1011 08:02:59.187946 1 python.cc:1523] 'python' TRITONBACKEND API version: 1.4
I1011 08:02:59.187967 1 python.cc:1545] backend configuration:
{}
I1011 08:02:59.187984 1 python.cc:1622] Shared memory configuration is shm-default-byte-size=67108864,shm-growth-byte-size=67108864,stub-timeout-seconds=30
I1011 08:02:59.188102 1 python.cc:1670] TRITONBACKEND_ModelInitialize: hotspot-detect (version 1)
I1011 08:02:59.189088 1 model_config_utils.cc:1524] ModelConfig 64-bit fields:
I1011 08:02:59.189104 1 model_config_utils.cc:1526]     ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I1011 08:02:59.189110 1 model_config_utils.cc:1526]     ModelConfig::dynamic_batching::max_queue_delay_microseconds
I1011 08:02:59.189115 1 model_config_utils.cc:1526]     ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I1011 08:02:59.189120 1 model_config_utils.cc:1526]     ModelConfig::ensemble_scheduling::step::model_version
I1011 08:02:59.189125 1 model_config_utils.cc:1526]     ModelConfig::input::dims
I1011 08:02:59.189131 1 model_config_utils.cc:1526]     ModelConfig::input::reshape::shape
I1011 08:02:59.189137 1 model_config_utils.cc:1526]     ModelConfig::instance_group::secondary_devices::device_id
I1011 08:02:59.189143 1 model_config_utils.cc:1526]     ModelConfig::model_warmup::inputs::value::dims
I1011 08:02:59.189150 1 model_config_utils.cc:1526]     ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I1011 08:02:59.189156 1 model_config_utils.cc:1526]     ModelConfig::optimization::cuda::graph_spec::input::value::dim
I1011 08:02:59.189162 1 model_config_utils.cc:1526]     ModelConfig::output::dims
I1011 08:02:59.189168 1 model_config_utils.cc:1526]     ModelConfig::output::reshape::shape
I1011 08:02:59.189174 1 model_config_utils.cc:1526]     ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I1011 08:02:59.189180 1 model_config_utils.cc:1526]     ModelConfig::sequence_batching::max_sequence_idle_microseconds
I1011 08:02:59.189186 1 model_config_utils.cc:1526]     ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I1011 08:02:59.189192 1 model_config_utils.cc:1526]     ModelConfig::version_policy::specific::versions
I1011 08:02:59.189309 1 python.cc:1462] Using Python execution env /mnt/models/HotspotDetectEnv.tar.gz
I1011 08:02:59.189473 1 python.cc:1714] TRITONBACKEND_ModelInstanceInitialize: hotspot-detect (CPU device 0)
I1011 08:02:59.189486 1 backend_model_instance.cc:68] Creating instance hotspot-detect on CPU using artifact ''
I1011 08:03:01.481239 23 python.cc:1073] Starting Python backend stub: source /tmp/python_env_UQHlCg/0/bin/activate && exec env LD_LIBRARY_PATH=/tmp/python_env_UQHlCg/0/lib:$LD_LIBRARY_PATH /opt/tritonserver/backends/python/triton_python_backend_stub /mnt/models/hotspot-detect/1/model.py /hotspot-detect_CPU_0 67108864 67108864 1 /opt/tritonserver/backends/python 16 hotspot-detect
Matplotlib created a temporary cache directory at /tmp/matplotlib-wqzon0db because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
I1011 08:03:02.423208 1 python.cc:1735] TRITONBACKEND_ModelInstanceInitialize: instance initialization successful hotspot-detect (device 0)
I1011 08:03:02.423452 1 dynamic_batch_scheduler.cc:230] Starting dynamic-batch scheduler thread 0 at nice 5...
I1011 08:03:02.423517 1 model_repository_manager.cc:1212] successfully loaded 'hotspot-detect' version 1
I1011 08:03:02.423601 1 model_repository_manager.cc:988] TriggerNextAction() 'hotspot-detect' version 1: 0
I1011 08:03:02.423615 1 model_repository_manager.cc:1003] no next action, trigger OnComplete()
I1011 08:03:02.423659 1 model_repository_manager.cc:594] VersionStates() 'hotspot-detect'
I1011 08:03:02.423774 1 server.cc:504]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1011 08:03:02.423881 1 server.cc:543]
+-------------+-----------------------------------------------------------------+--------+
| Backend     | Path                                                            | Config |
+-------------+-----------------------------------------------------------------+--------+
| tensorrt    | <built-in>                                                      | {}     |
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}     |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {}     |
| openvino    | /opt/tritonserver/backends/openvino/libtriton_openvino.so       | {}     |
| tensorflow  | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {}     |
| python      | /opt/tritonserver/backends/python/libtriton_python.so           | {}     |
+-------------+-----------------------------------------------------------------+--------+

I1011 08:03:02.423896 1 model_repository_manager.cc:570] BackendStates()
I1011 08:03:02.423927 1 server.cc:586]
+----------------+---------+--------+
| Model          | Version | Status |
+----------------+---------+--------+
| hotspot-detect | 1       | READY  |
+----------------+---------+--------+

I1011 08:03:02.424027 1 tritonserver.cc:1718]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                  |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                 |
| server_version                   | 2.13.0                                                                                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics |
| model_repository_path[0]         | /mnt/models                                                                                                                                                                            |
| model_control_mode               | MODE_NONE                                                                                                                                                                              |
| strict_model_config              | 0                                                                                                                                                                                      |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                              |
| min_supported_compute_capability | 6.0                                                                                                                                                                                    |
| strict_readiness                 | 1                                                                                                                                                                                      |
| exit_timeout                     | 30                                                                                                                                                                                     |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1011 08:03:02.424042 1 server.cc:234] Waiting for in-flight requests to complete.
I1011 08:03:02.424048 1 model_repository_manager.cc:694] AsyncUnload() 'hotspot-detect'
I1011 08:03:02.424055 1 model_repository_manager.cc:988] TriggerNextAction() 'hotspot-detect' version 1: 2
I1011 08:03:02.424062 1 model_repository_manager.cc:1071] Unload() 'hotspot-detect' version 1
I1011 08:03:02.424069 1 model_repository_manager.cc:1078] unloading: hotspot-detect:1
I1011 08:03:02.424145 1 model_repository_manager.cc:534] LiveBackendStates()
I1011 08:03:02.424157 1 server.cc:249] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I1011 08:03:02.424164 1 server.cc:256] hotspot-detect v1: UNLOADING
I1011 08:03:02.424208 1 dynamic_batch_scheduler.cc:465] Stopping dynamic-batch scheduler thread 0...
I1011 08:03:02.424307 1 python.cc:1783] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 08:03:03.424243 1 model_repository_manager.cc:534] LiveBackendStates()
I1011 08:03:03.424273 1 server.cc:249] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I1011 08:03:03.424279 1 server.cc:256] hotspot-detect v1: UNLOADING
I1011 08:03:03.519426 1 python.cc:1693] TRITONBACKEND_ModelFinalize: delete model state
I1011 08:03:03.519467 1 triton_backend_manager.cc:101] unloading backend 'python'
I1011 08:03:03.519472 1 python.cc:1650] TRITONBACKEND_Finalize: Start
I1011 08:03:03.837482 1 python.cc:1655] TRITONBACKEND_Finalize: End
I1011 08:03:03.838631 1 model_repository_manager.cc:1193] OnDestroy callback() 'hotspot-detect' version 1
I1011 08:03:03.838653 1 model_repository_manager.cc:1195] successfully unloaded 'hotspot-detect' version 1
I1011 08:03:03.838658 1 model_repository_manager.cc:988] TriggerNextAction() 'hotspot-detect' version 1: 0
I1011 08:03:04.424368 1 model_repository_manager.cc:534] LiveBackendStates()
I1011 08:03:04.424401 1 server.cc:249] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
I1011 08:03:04.424408 1 triton_backend_manager.cc:101] unloading backend 'pytorch'
I1011 08:03:04.424420 1 triton_backend_manager.cc:101] unloading backend 'tensorflow'
I1011 08:03:04.424431 1 triton_backend_manager.cc:101] unloading backend 'onnxruntime'
I1011 08:03:04.424461 1 triton_backend_manager.cc:101] unloading backend 'openvino'
error: creating server: Internal - failed to load all models

Triton Information nvcr.io/nvidia/tritonserver:21.08-py3

Are you using the Triton container or did you build it yourself? nvcr.io/nvidia/tritonserver:21.08-py3

To Reproduce Code for SeldonDeployment of inference server:

apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
  name: ts-triton-4
  namespace: dhinesh
spec:
  predictors:
  - componentSpecs:
    - spec:
        volumes:
          - name: seldon-podinfo
            downwardAPI:
              items:
                - path: annotations
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.annotations
              defaultMode: 420
          - name: hotspot-detect-provision-location
            emptyDir: {}
          - name: kube-api-access-xtxmz
            projected:
              sources:
                - serviceAccountToken:
                    expirationSeconds: 3607
                    path: token
                - configMap:
                    name: kube-root-ca.crt
                    items:
                      - key: ca.crt
                        path: ca.crt
                - downwardAPI:
                    items:
                      - path: namespace
                        fieldRef:
                          apiVersion: v1
                          fieldPath: metadata.namespace
              defaultMode: 420
        initContainers:
          - name: hotspot-detect-model-initializer
            image: seldonio/rclone-storage-initializer:1.17.1
            args:
              - s3://hotspot-detect/ts-models/
              - /mnt/models
            envFrom:
              - secretRef:
                  name: seldon-rclone-secret
            resources:
              limits:
                cpu: '1'
                memory: 1Gi
              requests:
                cpu: 100m
                memory: 100Mi
            volumeMounts:
              - name: hotspot-detect-provision-location
                mountPath: /mnt/models
              - name: kube-api-access-xtxmz
                readOnly: true
                mountPath: /var/run/secrets/kubernetes.io/serviceaccount
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            imagePullPolicy: IfNotPresent
        containers:
          - name: hotspot-detect
            image: nvcr.io/nvidia/tritonserver:21.08-py3
            args:
              - /opt/tritonserver/bin/tritonserver
              - '--grpc-port=9500'
              - '--http-port=9000'
              - '--model-repository=/mnt/models'
              - '--strict-model-config=false'
              - '--log-verbose=1'
            ports:
              - name: grpc
                containerPort: 9500
                protocol: TCP
              - name: http
                containerPort: 9000
                protocol: TCP
            env:
              - name: PREDICTIVE_UNIT_SERVICE_PORT
                value: '9000'
              - name: PREDICTIVE_UNIT_HTTP_SERVICE_PORT
                value: '9000'
              - name: MLSERVER_HTTP_PORT
                value: '9000'
              - name: PREDICTIVE_UNIT_GRPC_SERVICE_PORT
                value: '9500'
              - name: MLSERVER_GRPC_PORT
                value: '9500'
              - name: MLSERVER_MODEL_URI
                value: /mnt/models
              - name: PREDICTIVE_UNIT_ID
                value: hotspot-detect
              - name: MLSERVER_MODEL_NAME
                value: hotspot-detect
              - name: PREDICTIVE_UNIT_IMAGE
                value: nvcr.io/nvidia/tritonserver:21.08-py3
              - name: PREDICTOR_ID
                value: default
              - name: PREDICTOR_LABELS
                value: '{"sidecar.istio.io/inject":"false","version":"default"}'
              - name: SELDON_DEPLOYMENT_ID
                value: ts-triton-4
              - name: SELDON_EXECUTOR_ENABLED
                value: 'true'
              - name: PREDICTIVE_UNIT_METRICS_SERVICE_PORT
                value: '6000'
              - name: PREDICTIVE_UNIT_METRICS_ENDPOINT
                value: /prometheus
              - name: MLSERVER_METRICS_PORT
                value: '6000'
              - name: MLSERVER_METRICS_ENDPOINT
                value: /prometheus
            resources: {}
            volumeMounts:
              - name: seldon-podinfo
                mountPath: /etc/podinfo
              - name: hotspot-detect-provision-location
                readOnly: true
                mountPath: /mnt/models
              - name: kube-api-access-xtxmz
                readOnly: true
                mountPath: /var/run/secrets/kubernetes.io/serviceaccount
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            imagePullPolicy: IfNotPresent
          - name: seldon-container-engine
            image: docker.io/seldonio/seldon-core-executor:1.17.1
            args:
              - '--sdep'
              - ts-triton-4
              - '--namespace'
              - dhinesh
              - '--predictor'
              - default
              - '--http_port'
              - '8000'
              - '--grpc_port'
              - '5001'
              - '--protocol'
              - v2
              - '--transport'
              - rest
              - '--prometheus_path'
              - /prometheus
              - '--server_type'
              - rpc
              - '--log_work_buffer_size'
              - '10000'
              - '--log_write_timeout_ms'
              - '2000'
              - '--full_health_checks=false'
            ports:
              - name: http
                containerPort: 8000
                protocol: TCP
              # - name: metrics
              #   containerPort: 8000
              #   protocol: TCP
              - name: grpc
                containerPort: 5001
                protocol: TCP
            env:
              - name: ENGINE_PREDICTOR
                value: [REDACTED]
              - name: REQUEST_LOGGER_DEFAULT_ENDPOINT
                value: http://default-broker
              - name: SELDON_LOG_LEVEL
                value: DEBUG
            resources:
              limits:
                cpu: 500m
                memory: 512Mi
              requests:
                cpu: 500m
                memory: 512Mi
            volumeMounts:
              - name: seldon-podinfo
                mountPath: /etc/podinfo
              - name: kube-api-access-xtxmz
                readOnly: true
                mountPath: /var/run/secrets/kubernetes.io/serviceaccount
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            imagePullPolicy: IfNotPresent
            securityContext:
              runAsUser: 8888
              allowPrivilegeEscalation: false

    graph:
      children: []
      envSecretRefName: seldon-rclone-secret
      implementation: TRITON_SERVER
      modelUri: s3://hotspot-detect/ts-models/
      name: hotspot-detect
      parameters: []
    labels:
      sidecar.istio.io/inject: 'false'
    name: default
    replicas: 1
    svcOrchSpec:
      env:
      - name: SELDON_LOG_LEVEL
        value: DEBUG
  protocol: v2

Content of S3 bucket with Python type model:

aws s3 ls hotspot-detect/ts-models/hotspot-detect/
                           PRE 1/
2023-10-09 10:48:42  133624020 HotspotDetectEnv.tar.gz
2023-10-10 10:27:04        762 config.pbtxt

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior Server should serve predictions.

tanmayv25 commented 1 year ago

@tomaszstachera There are a number of errors reported when loading the model.

E1011 08:02:59.085540 1 logging.cc:40] [runtime.cpp::isCudaInstalledCorrectly::38] Error Code 6: Internal Error (CUDA initialization failure with error 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)

That is why the server failed and exited. Appears to be an issue with CUDA setup on the machine.

Additionally, 21.08 is almost a 2 years old release. Can you move to a newer version?

tomaszstachera commented 1 year ago

@tomaszstachera There are a number of errors reported when loading the model.

E1011 08:02:59.085540 1 logging.cc:40] [runtime.cpp::isCudaInstalledCorrectly::38] Error Code 6: Internal Error (CUDA initialization failure with error 35. Please check your CUDA installation:  http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html)

That is why the server failed and exited. Appears to be an issue with CUDA setup on the machine.

Additionally, 21.08 is almost a 2 years old release. Can you move to a newer version?

You are wrong, root cause of the issue were additional files in the S3 bucket - what is nowhere mentioned in the logs. After trial and error of cleaning up the bucket it worked. There was no need to fix CUDA errors.

Newest version from here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver/tags (23.09-py3) throws different boto3 errors os it is also not working.

BTW I am using Seldon and its newest version is pointing to 21.08. https://artifacthub.io/packages/helm/seldon/seldon-core-operator