triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.14k stars 1.46k forks source link

error: creating server: Internal - failed to load all models #6416

Closed xiaohuimc closed 11 months ago

xiaohuimc commented 12 months ago

Description A clear and concise description of what the bug is.

Triton Information What version of Triton are you using?

nvcr.io/nvidia/tritonserver:23.09-py3

Are you using the Triton container or did you build it yourself?

To Reproduce Steps to reproduce the behavior.

$ mkdir -p /workspace/server/docs/examples/model_repository/face_test/1
$ cd /workspace/server/docs/examples/model_repository/face_test/1
$ wget https://github.com/facefusion/facefusion-assets/releases/download/models/inswapper_128.onnx
$ cd ..
$ tree
.
├── 1
│   └── inswapper_128.onnx
└── config.pbtxt
$ cat config.pbtxt 
name: "face_test"
platform: "onnxruntime_onnx"
max_batch_size : 0
input [
  {
    name: "target"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 128, 128 ]
    reshape { shape: [ 1, 3, 128, 128 ] }
  },
  {
    name: "source"
    data_type: TYPE_FP32
    dims: [ 512 ]
    reshape { shape: [ 1, 512 ] }
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 128, 128 ]
    reshape { shape: [ 1, 3, 128, 128 ] }
  }

$ docker run --gpus=1,2 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /workspace/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:23.09-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.09 (build 69485437)
Triton Server Version 2.38.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

..............

I1011 13:28:20.937908 1 model_lifecycle.cc:818] successfully loaded 'inception_graphdef'
I1011 13:28:21.029577 1 model_lifecycle.cc:818] successfully loaded 'densenet_onnx'
I1011 13:28:21.029695 1 server.cc:592] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1011 13:28:21.029727 1 server.cc:619] 
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                                                        |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tensorflow  | /opt/tritonserver/backends/tensorflow/libtriton_tensorflow.so   | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1011 13:28:21.029780 1 server.cc:662] 
+--------------------+---------+--------+
| Model              | Version | Status |
+--------------------+---------+--------+
| densenet_onnx      | 1       | READY  |
| inception_graphdef | 1       | READY  |
| simple             | 1       | READY  |
| simple_identity    | 1       | READY  |
| simple_int8        | 1       | READY  |
| simple_sequence    | 1       | READY  |
| simple_test        | 1       | READY  |
+--------------------+---------+--------+

I1011 13:28:21.060600 1 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I1011 13:28:21.060614 1 metrics.cc:817] Collecting metrics for GPU 1: NVIDIA GeForce RTX 3090
I1011 13:28:21.060722 1 metrics.cc:710] Collecting CPU metrics
I1011 13:28:21.060814 1 tritonserver.cc:2437] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.38.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1011 13:28:21.060819 1 server.cc:293] Waiting for in-flight requests to complete.
I1011 13:28:21.060825 1 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I1011 13:28:21.060988 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061004 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061022 1 server.cc:324] All models are stopped, unloading models
I1011 13:28:21.061034 1 server.cc:331] Timeout 30: Found 7 live models and 0 in-flight non-inference requests
I1011 13:28:21.061229 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061244 1 tensorflow.cc:2709] TRITONBACKEND_ModelFinalize: delete model state
I1011 13:28:21.061284 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061297 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061300 1 tensorflow.cc:2709] TRITONBACKEND_ModelFinalize: delete model state
I1011 13:28:21.061320 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061388 1 model_lifecycle.cc:603] successfully unloaded 'simple_int8' version 1
I1011 13:28:21.061381 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061397 1 onnxruntime.cc:2742] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061414 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061437 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061439 1 model_lifecycle.cc:603] successfully unloaded 'simple_test' version 1
I1011 13:28:21.061452 1 tensorflow.cc:2709] TRITONBACKEND_ModelFinalize: delete model state
I1011 13:28:21.061513 1 model_lifecycle.cc:603] successfully unloaded 'simple_sequence' version 1
I1011 13:28:21.061611 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061635 1 tensorflow.cc:2709] TRITONBACKEND_ModelFinalize: delete model state
I1011 13:28:21.061686 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061705 1 tensorflow.cc:2709] TRITONBACKEND_ModelFinalize: delete model state
I1011 13:28:21.061719 1 tensorflow.cc:2770] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.061735 1 tensorflow.cc:2709] TRITONBACKEND_ModelFinalize: delete model state
I1011 13:28:21.061801 1 model_lifecycle.cc:603] successfully unloaded 'simple' version 1
I1011 13:28:21.061807 1 model_lifecycle.cc:603] successfully unloaded 'simple_identity' version 1
I1011 13:28:21.063567 1 model_lifecycle.cc:603] successfully unloaded 'inception_graphdef' version 1
I1011 13:28:21.065593 1 onnxruntime.cc:2742] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I1011 13:28:21.070761 1 onnxruntime.cc:2666] TRITONBACKEND_ModelFinalize: delete model state
I1011 13:28:21.070789 1 model_lifecycle.cc:603] successfully unloaded 'densenet_onnx' version 1
I1011 13:28:22.061147 1 server.cc:331] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).

Expected behavior A clear and concise description of what you expected to happen.

tanmayv25 commented 12 months ago

Looks like the actual error while loading "face_test" model might be burried in the following section:

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.09 (build 69485437)
Triton Server Version 2.38.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

..............

I1011 13:28:20.937908 1 model_lifecycle.cc:818] successfully loaded 'inception_graphdef'
I1011 13:28:21.029577 1 model_lifecycle.cc:818] successfully loaded 'densenet_onnx'

I would propose removing all the other models from the model_repository and trying to launch triton again only with face_test model. You might have to rename inswapper_128.onnx to model.onnx or provide inswapper_128.onnx in default_model_filename in config.pbtxt.

xiaohuimc commented 12 months ago

Other models have been deleted, inswapper_ 128.onnx has been renamed to model.onnx, and the running log is as follows:

root@07-3090-64:/workspace# docker run --gpus=1,2 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /workspace/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:23.09-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.09 (build 69485437)
Triton Server Version 2.38.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I1012 01:20:59.153546 1 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f6b66000000' with size 268435456
I1012 01:20:59.153790 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I1012 01:20:59.153793 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
W1012 01:20:59.201359 1 server.cc:238] failed to enable peer access for some device pairs
[libprotobuf ERROR /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/grpc-repo/src/grpc/third_party/protobuf/src/google/protobuf/text_format.cc:335] Error parsing text-format inference.ModelConfig: 22:11: Message type "inference.ModelOutput" has no field named "format".
E1012 01:20:59.202042 1 model_repository_manager.cc:1309] Poll failed for model directory 'face_test': failed to read text proto from /models/face_test/config.pbtxt
I1012 01:20:59.202060 1 server.cc:592] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1012 01:20:59.202065 1 server.cc:619] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I1012 01:20:59.202069 1 server.cc:662] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I1012 01:20:59.232922 1 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I1012 01:20:59.232939 1 metrics.cc:817] Collecting metrics for GPU 1: NVIDIA GeForce RTX 3090
I1012 01:20:59.233030 1 metrics.cc:710] Collecting CPU metrics
I1012 01:20:59.233118 1 tritonserver.cc:2437] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.38.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1012 01:20:59.233123 1 server.cc:293] Waiting for in-flight requests to complete.
I1012 01:20:59.233125 1 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I1012 01:20:59.233127 1 server.cc:324] All models are stopped, unloading models
I1012 01:20:59.233128 1 server.cc:331] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
root@07-3090-64:/workspace# docker run --gpus=1,2 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /workspace/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:23.09-py3 tritonserver --model-repository=/models

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.09 (build 69485437)
Triton Server Version 2.38.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I1012 01:23:24.386916 1 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7fc3b6000000' with size 268435456
I1012 01:23:24.387160 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I1012 01:23:24.387162 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
W1012 01:23:24.434437 1 server.cc:238] failed to enable peer access for some device pairs
[libprotobuf ERROR /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/grpc-repo/src/grpc/third_party/protobuf/src/google/protobuf/text_format.cc:335] Error parsing text-format inference.ModelConfig: 23:11: Message type "inference.ModelOutput" has no field named "format".
E1012 01:23:24.435130 1 model_repository_manager.cc:1309] Poll failed for model directory 'face_test': failed to read text proto from /models/face_test/config.pbtxt
I1012 01:23:24.435147 1 server.cc:592] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I1012 01:23:24.435153 1 server.cc:619] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I1012 01:23:24.435159 1 server.cc:662] 
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+

I1012 01:23:24.466427 1 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I1012 01:23:24.466439 1 metrics.cc:817] Collecting metrics for GPU 1: NVIDIA GeForce RTX 3090
I1012 01:23:24.466523 1 metrics.cc:710] Collecting CPU metrics
I1012 01:23:24.466629 1 tritonserver.cc:2437] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.38.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                                        |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                                                                                                                        |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I1012 01:23:24.466635 1 server.cc:293] Waiting for in-flight requests to complete.
I1012 01:23:24.466638 1 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I1012 01:23:24.466640 1 server.cc:324] All models are stopped, unloading models
I1012 01:23:24.466641 1 server.cc:331] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
tanmayv25 commented 12 months ago

Poll failed for model directory 'face_test': failed to read text proto from /models/face_test/config.pbtxt

There is some issue in formatting your config.pbtxt. Try removing config.pbtxt and rerunning. Triton should auto-complete the config for you.

tanmayv25 commented 12 months ago

I think you might be missing a closing square brace for the output??

name: "face_test"
platform: "onnxruntime_onnx"
max_batch_size : 0
input [
  {
    name: "target"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 128, 128 ]
    reshape { shape: [ 1, 3, 128, 128 ] }
  },
  {
    name: "source"
    data_type: TYPE_FP32
    dims: [ 512 ]
    reshape { shape: [ 1, 512 ] }
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    format: FORMAT_NCHW
    dims: [ 3, 128, 128 ]
    reshape { shape: [ 1, 3, 128, 128 ] }
  }
]
xiaohuimc commented 12 months ago

Poll failed for model directory 'face_test': failed to read text proto from /models/face_test/config.pbtxt

There is some issue in formatting your config.pbtxt. Try removing config.pbtxt and rerunning. Triton should auto-complete the config for you.

Thank you very much. After removing config.pbtxt, it can run successfully.

xiaohuimc commented 12 months ago

I think you might be missing a closing square brace for the output??


name: "face_test"

platform: "onnxruntime_onnx"

max_batch_size : 0

input [

  {

    name: "target"

    data_type: TYPE_FP32

    format: FORMAT_NCHW

    dims: [ 3, 128, 128 ]

    reshape { shape: [ 1, 3, 128, 128 ] }

  },

  {

    name: "source"

    data_type: TYPE_FP32

    dims: [ 512 ]

    reshape { shape: [ 1, 512 ] }

  }

]

output [

  {

    name: "output"

    data_type: TYPE_FP32

    format: FORMAT_NCHW

    dims: [ 3, 128, 128 ]

    reshape { shape: [ 1, 3, 128, 128 ] }

  }

]

Yes, using this configuration file can also run successfully. Thank you.