Closed xiaohuimc closed 11 months ago
Looks like the actual error while loading "face_test" model might be burried in the following section:
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 23.09 (build 69485437)
Triton Server Version 2.38.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
..............
I1011 13:28:20.937908 1 model_lifecycle.cc:818] successfully loaded 'inception_graphdef'
I1011 13:28:21.029577 1 model_lifecycle.cc:818] successfully loaded 'densenet_onnx'
I would propose removing all the other models from the model_repository and trying to launch triton again only with face_test
model.
You might have to rename inswapper_128.onnx
to model.onnx
or provide inswapper_128.onnx
in default_model_filename in config.pbtxt.
Other models have been deleted, inswapper_ 128.onnx has been renamed to model.onnx, and the running log is as follows:
root@07-3090-64:/workspace# docker run --gpus=1,2 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /workspace/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:23.09-py3 tritonserver --model-repository=/models
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 23.09 (build 69485437)
Triton Server Version 2.38.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
I1012 01:20:59.153546 1 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7f6b66000000' with size 268435456
I1012 01:20:59.153790 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I1012 01:20:59.153793 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
W1012 01:20:59.201359 1 server.cc:238] failed to enable peer access for some device pairs
[libprotobuf ERROR /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/grpc-repo/src/grpc/third_party/protobuf/src/google/protobuf/text_format.cc:335] Error parsing text-format inference.ModelConfig: 22:11: Message type "inference.ModelOutput" has no field named "format".
E1012 01:20:59.202042 1 model_repository_manager.cc:1309] Poll failed for model directory 'face_test': failed to read text proto from /models/face_test/config.pbtxt
I1012 01:20:59.202060 1 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I1012 01:20:59.202065 1 server.cc:619]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+
I1012 01:20:59.202069 1 server.cc:662]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+
I1012 01:20:59.232922 1 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I1012 01:20:59.232939 1 metrics.cc:817] Collecting metrics for GPU 1: NVIDIA GeForce RTX 3090
I1012 01:20:59.233030 1 metrics.cc:710] Collecting CPU metrics
I1012 01:20:59.233118 1 tritonserver.cc:2437]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.38.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I1012 01:20:59.233123 1 server.cc:293] Waiting for in-flight requests to complete.
I1012 01:20:59.233125 1 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I1012 01:20:59.233127 1 server.cc:324] All models are stopped, unloading models
I1012 01:20:59.233128 1 server.cc:331] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
root@07-3090-64:/workspace# docker run --gpus=1,2 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /workspace/server/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:23.09-py3 tritonserver --model-repository=/models
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 23.09 (build 69485437)
Triton Server Version 2.38.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
I1012 01:23:24.386916 1 pinned_memory_manager.cc:241] Pinned memory pool is created at '0x7fc3b6000000' with size 268435456
I1012 01:23:24.387160 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I1012 01:23:24.387162 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 1 with size 67108864
W1012 01:23:24.434437 1 server.cc:238] failed to enable peer access for some device pairs
[libprotobuf ERROR /tmp/tritonbuild/tritonserver/build/_deps/repo-third-party-build/grpc-repo/src/grpc/third_party/protobuf/src/google/protobuf/text_format.cc:335] Error parsing text-format inference.ModelConfig: 23:11: Message type "inference.ModelOutput" has no field named "format".
E1012 01:23:24.435130 1 model_repository_manager.cc:1309] Poll failed for model directory 'face_test': failed to read text proto from /models/face_test/config.pbtxt
I1012 01:23:24.435147 1 server.cc:592]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I1012 01:23:24.435153 1 server.cc:619]
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+
I1012 01:23:24.435159 1 server.cc:662]
+-------+---------+--------+
| Model | Version | Status |
+-------+---------+--------+
+-------+---------+--------+
I1012 01:23:24.466427 1 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090
I1012 01:23:24.466439 1 metrics.cc:817] Collecting metrics for GPU 1: NVIDIA GeForce RTX 3090
I1012 01:23:24.466523 1 metrics.cc:710] Collecting CPU metrics
I1012 01:23:24.466629 1 tritonserver.cc:2437]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.38.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /models |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| cuda_memory_pool_byte_size{1} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I1012 01:23:24.466635 1 server.cc:293] Waiting for in-flight requests to complete.
I1012 01:23:24.466638 1 server.cc:309] Timeout 30: Found 0 model versions that have in-flight inferences
I1012 01:23:24.466640 1 server.cc:324] All models are stopped, unloading models
I1012 01:23:24.466641 1 server.cc:331] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
Poll failed for model directory 'face_test': failed to read text proto from /models/face_test/config.pbtxt
There is some issue in formatting your config.pbtxt. Try removing config.pbtxt and rerunning. Triton should auto-complete the config for you.
I think you might be missing a closing square brace for the output??
name: "face_test"
platform: "onnxruntime_onnx"
max_batch_size : 0
input [
{
name: "target"
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 128, 128 ]
reshape { shape: [ 1, 3, 128, 128 ] }
},
{
name: "source"
data_type: TYPE_FP32
dims: [ 512 ]
reshape { shape: [ 1, 512 ] }
}
]
output [
{
name: "output"
data_type: TYPE_FP32
format: FORMAT_NCHW
dims: [ 3, 128, 128 ]
reshape { shape: [ 1, 3, 128, 128 ] }
}
]
Poll failed for model directory 'face_test': failed to read text proto from /models/face_test/config.pbtxt
There is some issue in formatting your config.pbtxt. Try removing config.pbtxt and rerunning. Triton should auto-complete the config for you.
Thank you very much. After removing config.pbtxt, it can run successfully.
I think you might be missing a closing square brace for the output??
name: "face_test" platform: "onnxruntime_onnx" max_batch_size : 0 input [ { name: "target" data_type: TYPE_FP32 format: FORMAT_NCHW dims: [ 3, 128, 128 ] reshape { shape: [ 1, 3, 128, 128 ] } }, { name: "source" data_type: TYPE_FP32 dims: [ 512 ] reshape { shape: [ 1, 512 ] } } ] output [ { name: "output" data_type: TYPE_FP32 format: FORMAT_NCHW dims: [ 3, 128, 128 ] reshape { shape: [ 1, 3, 128, 128 ] } } ]
Yes, using this configuration file can also run successfully. Thank you.
Description A clear and concise description of what the bug is.
Triton Information What version of Triton are you using?
nvcr.io/nvidia/tritonserver:23.09-py3
Are you using the Triton container or did you build it yourself?
To Reproduce Steps to reproduce the behavior.
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Expected behavior A clear and concise description of what you expected to happen.