triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.14k stars 1.46k forks source link

Unable to load repository agent #6346

Closed chenyufeifei closed 12 months ago

chenyufeifei commented 1 year ago

Description I want to try loading checksum_repository_agent into the Triton Inference Server, but the Repository Agent is empty when starting.

Triton Information nvcr.io/nvidia/tritonserver:23.08-py3

Are you using the Triton container or did you build it yourself?

To Reproduce 1、build the checksum repository agent

mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install ..
make install

generate the libtritonrepoagent_checksum.so 2、Place libtritonrepoagent_checksum.so in the <repository_agent_directory>/agent_repository/checksum directory. 3、run docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v <moedel_directory>/model_repository:/models -v /<repository_agent_directory>/agent_repository:/opt/tritonserver/repoagents nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models

The program runs successfully, but the Repository Agent is empty.

=============================
== Triton Inference Server ==
=============================

NVIDIA Release 23.08 (build 66820947)
Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35
I0919 13:10:55.030887 1 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch
I0919 13:10:55.031036 1 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15
I0919 13:10:55.031086 1 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15
W0919 13:10:55.031182 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
I0919 13:10:55.031233 1 cuda_memory_manager.cc:117] CUDA memory pool disabled
I0919 13:10:55.039286 1 model_lifecycle.cc:462] loading: inception_graphdef:1
I0919 13:10:55.039451 1 model_lifecycle.cc:462] loading: simple:1
I0919 13:10:55.039586 1 model_lifecycle.cc:462] loading: simple_sequence:1
I0919 13:10:55.039776 1 model_lifecycle.cc:462] loading: densenet_onnx:1
I0919 13:10:55.040371 1 model_lifecycle.cc:462] loading: simple_dyna_sequence:1
I0919 13:10:55.040493 1 model_lifecycle.cc:462] loading: simple_identity:1
I0919 13:10:55.040900 1 model_lifecycle.cc:462] loading: simple_int8:1
I0919 13:10:55.041941 1 model_lifecycle.cc:462] loading: simple_string:1
I0919 13:10:55.651081 1 tensorflow.cc:2577] TRITONBACKEND_Initialize: tensorflow
I0919 13:10:55.651228 1 tensorflow.cc:2587] Triton TRITONBACKEND API version: 1.15
I0919 13:10:55.651288 1 tensorflow.cc:2593] 'tensorflow' TRITONBACKEND API version: 1.15
I0919 13:10:55.651344 1 tensorflow.cc:2617] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0919 13:10:55.651666 1 tensorflow.cc:2683] TRITONBACKEND_ModelInitialize: simple (version 1)
I0919 13:10:55.652011 1 tensorflow.cc:2683] TRITONBACKEND_ModelInitialize: simple_sequence (version 1)
I0919 13:10:55.652263 1 tensorflow.cc:2683] TRITONBACKEND_ModelInitialize: inception_graphdef (version 1)
I0919 13:10:55.655531 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_0 (CPU device 0)
I0919 13:10:55.656731 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_sequence_0_0 (CPU device 0)
2023-09-19 13:10:55.656868: I tensorflow/core/platform/cpu_feature_guard.cc:183] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX, in other operations, rebuild TensorFlow with the appropriate compiler flags.
I0919 13:10:55.657407 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: inception_graphdef_0 (CPU device 0)
I0919 13:10:55.660041 1 onnxruntime.cc:2514] TRITONBACKEND_Initialize: onnxruntime
I0919 13:10:55.660086 1 onnxruntime.cc:2524] Triton TRITONBACKEND API version: 1.15
I0919 13:10:55.660106 1 onnxruntime.cc:2530] 'onnxruntime' TRITONBACKEND API version: 1.15
I0919 13:10:55.660125 1 onnxruntime.cc:2560] backend configuration:
{"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}}
I0919 13:10:55.857789 1 onnxruntime.cc:2625] TRITONBACKEND_ModelInitialize: densenet_onnx (version 1)
I0919 13:10:55.861536 1 onnxruntime.cc:692] skipping model configuration auto-complete for 'densenet_onnx': inputs and outputs already specified
I0919 13:10:55.866902 1 onnxruntime.cc:2690] TRITONBACKEND_ModelInstanceInitialize: densenet_onnx_1 (CPU device 0)
I0919 13:10:55.867078 1 onnxruntime.cc:2690] TRITONBACKEND_ModelInstanceInitialize: densenet_onnx_0 (CPU device 0)
2023-09-19 13:10:55.909819: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
I0919 13:10:55.936268 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_1 (CPU device 0)
I0919 13:10:55.938849 1 model_lifecycle.cc:819] successfully loaded 'simple'
I0919 13:10:55.939535 1 tensorflow.cc:2683] TRITONBACKEND_ModelInitialize: simple_dyna_sequence (version 1)
I0919 13:10:55.942033 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_dyna_sequence_0_0 (CPU device 0)
I0919 13:10:55.943483 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_sequence_0_1 (CPU device 0)
W0919 13:10:55.946687 1 pinned_memory_manager.cc:134] failed to allocate pinned system memory: no pinned memory pool, falling back to non-pinned system memory
I0919 13:10:55.948400 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_dyna_sequence_0_1 (CPU device 0)
I0919 13:10:55.961713 1 model_lifecycle.cc:819] successfully loaded 'simple_sequence'
I0919 13:10:55.962067 1 tensorflow.cc:2683] TRITONBACKEND_ModelInitialize: simple_identity (version 1)
2023-09-19 13:10:55.963551: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/simple_identity/1/model.savedmodel
2023-09-19 13:10:55.964083: I tensorflow/cc/saved_model/reader.cc:91] Reading meta graph with tags { serve }
2023-09-19 13:10:55.964191: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /models/simple_identity/1/model.savedmodel
2023-09-19 13:10:55.965584: I tensorflow/cc/saved_model/loader.cc:334] SavedModel load for tags { serve }; Status: success: OK. Took 2059 microseconds.
I0919 13:10:55.967256 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_identity_0 (CPU device 0)
2023-09-19 13:10:55.967579: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/simple_identity/1/model.savedmodel
2023-09-19 13:10:55.967963: I tensorflow/cc/saved_model/reader.cc:91] Reading meta graph with tags { serve }
2023-09-19 13:10:55.968083: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /models/simple_identity/1/model.savedmodel
2023-09-19 13:10:55.971860: I tensorflow/cc/saved_model/loader.cc:334] SavedModel load for tags { serve }; Status: success: OK. Took 4290 microseconds.
I0919 13:10:55.974414 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_identity_1 (CPU device 0)
2023-09-19 13:10:55.974657: I tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/simple_identity/1/model.savedmodel
2023-09-19 13:10:55.975053: I tensorflow/cc/saved_model/reader.cc:91] Reading meta graph with tags { serve }
2023-09-19 13:10:55.975159: I tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /models/simple_identity/1/model.savedmodel
2023-09-19 13:10:55.976790: I tensorflow/cc/saved_model/loader.cc:334] SavedModel load for tags { serve }; Status: success: OK. Took 1956 microseconds.
I0919 13:10:55.979052 1 model_lifecycle.cc:819] successfully loaded 'simple_identity'
I0919 13:10:55.979370 1 tensorflow.cc:2683] TRITONBACKEND_ModelInitialize: simple_int8 (version 1)
I0919 13:10:55.981441 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_int8_0 (CPU device 0)
I0919 13:10:55.990599 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_int8_1 (CPU device 0)
I0919 13:10:56.001892 1 model_lifecycle.cc:819] successfully loaded 'simple_int8'
I0919 13:10:56.002174 1 tensorflow.cc:2683] TRITONBACKEND_ModelInitialize: simple_string (version 1)
I0919 13:10:56.003767 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_string_0 (CPU device 0)
I0919 13:10:56.008706 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: simple_string_1 (CPU device 0)
I0919 13:10:56.011434 1 model_lifecycle.cc:819] successfully loaded 'simple_dyna_sequence'
I0919 13:10:56.011666 1 model_lifecycle.cc:819] successfully loaded 'simple_string'
I0919 13:10:56.606452 1 tensorflow.cc:2732] TRITONBACKEND_ModelInstanceInitialize: inception_graphdef_1 (CPU device 0)
I0919 13:10:57.272452 1 model_lifecycle.cc:819] successfully loaded 'inception_graphdef'
I0919 13:10:57.284311 1 model_lifecycle.cc:819] successfully loaded 'densenet_onnx'
I0919 13:10:57.284919 1 server.cc:604] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0919 13:10:57.286128 1 server.cc:631] 
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend     | Path                                                            | Config                                                                                                                                                        |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch     | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so         | {}                                                                                                                                                            |
| tensorflow  | /opt/tritonserver/backends/tensorflow/libtriton_tensorflow.so   | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
| onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+-------------+-----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0919 13:10:57.286966 1 server.cc:674] 
+----------------------+---------+--------+
| Model                | Version | Status |
+----------------------+---------+--------+
| densenet_onnx        | 1       | READY  |
| inception_graphdef   | 1       | READY  |
| simple               | 1       | READY  |
| simple_dyna_sequence | 1       | READY  |
| simple_identity      | 1       | READY  |
| simple_int8          | 1       | READY  |
| simple_sequence      | 1       | READY  |
| simple_string        | 1       | READY  |
+----------------------+---------+--------+

I0919 13:10:57.287907 1 metrics.cc:703] Collecting CPU metrics
I0919 13:10:57.288396 1 tritonserver.cc:2435] 
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                                           |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                                          |
| server_version                   | 2.37.0                                                                                                                                                                                                          |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models                                                                                                                                                                                                         |
| model_control_mode               | MODE_NONE                                                                                                                                                                                                       |
| strict_model_config              | 0                                                                                                                                                                                                               |
| rate_limit                       | OFF                                                                                                                                                                                                             |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                                       |
| min_supported_compute_capability | 6.0                                                                                                                                                                                                             |
| strict_readiness                 | 1                                                                                                                                                                                                               |
| exit_timeout                     | 30                                                                                                                                                                                                              |
| cache_enabled                    | 0                                                                                                                                                                                                               |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0919 13:10:57.297319 1 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001
I0919 13:10:57.298019 1 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000
I0919 13:10:57.347309 1 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002

Expected behavior Checksum agent loaded successfully.

gaius-qi commented 12 months ago

@nnshah1 We are recently writing a plugin to speed up model downloads. Can you help us look at this problem?

nnshah1 commented 12 months ago

@gaius-qi - Will look into the issue and respond today.

gaius-qi commented 12 months ago

@gaius-qi - Will look into the issue and respond today.

Thanks

nnshah1 commented 12 months ago

I was able to build and add the repository agent successfully.

A couple notes:

1) repository agents are only loaded if a model is found with a corresponding section in its config.pb.txt (https://github.com/triton-inference-server/checksum_repository_agent/tree/main#using-the-checksum-repository-agent)

This is described here:

https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/repository_agents.md#implementing-a-repository-agent

2) I found that the table for repository agents was only updated on a successful model load. That is if the checksum was configured but had a incorrect hash or pointed to a missing file - the model didn't load and the repository agent wasn't listed (even though it had been used to prevent the loading of the model).

Can you confirm you've added the sections to your model config when adding the checksum repoagent?

Vanish1011 commented 12 months ago

@nnshah1
Description: I try to use s3 to poll my models (tritonserver --model-store=s3://172.17.0.2:9000/models),The models was successfully pulled up. But when i want to use repository agent(checksum) at the some time , it reported some wrongs. :============================= == Triton Inference Server ==

NVIDIA Release 21.09 (build 27443074)

Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use Docker with NVIDIA Container Toolkit to start this container; see https://github.com/NVIDIA/nvidia-docker.

I1011 10:46:29.450213 1 libtorch.cc:1030] TRITONBACKEND_Initialize: pytorch I1011 10:46:29.450536 1 libtorch.cc:1040] Triton TRITONBACKEND API version: 1.5 I1011 10:46:29.450548 1 libtorch.cc:1046] 'pytorch' TRITONBACKEND API version: 1.5 2023-10-11 10:46:29.637876: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 I1011 10:46:29.784232 1 tensorflow.cc:2170] TRITONBACKEND_Initialize: tensorflow I1011 10:46:29.784259 1 tensorflow.cc:2180] Triton TRITONBACKEND API version: 1.5 I1011 10:46:29.784625 1 tensorflow.cc:2186] 'tensorflow' TRITONBACKEND API version: 1.5 I1011 10:46:29.784649 1 tensorflow.cc:2210] backend configuration: {} I1011 10:46:29.799076 1 onnxruntime.cc:1997] TRITONBACKEND_Initialize: onnxruntime I1011 10:46:29.799102 1 onnxruntime.cc:2007] Triton TRITONBACKEND API version: 1.5 I1011 10:46:29.799324 1 onnxruntime.cc:2013] 'onnxruntime' TRITONBACKEND API version: 1.5 I1011 10:46:29.840653 1 openvino.cc:1193] TRITONBACKEND_Initialize: openvino I1011 10:46:29.840679 1 openvino.cc:1203] Triton TRITONBACKEND API version: 1.5 I1011 10:46:29.840681 1 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.5 W1011 10:46:29.841642 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version I1011 10:46:29.841978 1 cuda_memory_manager.cc:115] CUDA memory pool disabled E1011 10:46:29.884081 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1 I1011 10:46:29.884126 1 server.cc:519] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+

I1011 10:46:29.884159 1 server.cc:546] +-------------+-----------------------------------------------------------------+--------+ | Backend | Path | Config | +-------------+-----------------------------------------------------------------+--------+ | pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} | | tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} | | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} | | openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} | +-------------+-----------------------------------------------------------------+--------+

I1011 10:46:29.884184 1 server.cc:589] +-------+---------+--------+ | Model | Version | Status | +-------+---------+--------+ +-------+---------+--------+

I1011 10:46:29.884260 1 tritonserver.cc:1836] +----------------------------------+----------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.14.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration sys | | | tem_shared_memory cuda_shared_memory binary_tensor_data statistics | | model_repository_path[0] | s3://172.17.0.2:9000/models | | model_control_mode | MODE_POLL | | strict_model_config | 1 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------+

E1011 10:46:29.884317 1 tritonserver.cc:1845] Internal: failed to load all models I1011 10:46:29.892605 1 grpc_server.cc:4111] Started GRPCInferenceService at 0.0.0.0:8001 I1011 10:46:29.892740 1 http_server.cc:2803] Started HTTPService at 0.0.0.0:8000 I1011 10:46:29.937924 1 http_server.cc:162] Started Metrics Service at 0.0.0.0:8002 E1011 10:46:29.974866 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1 E1011 10:46:33.018474 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1 E1011 10:46:36.054908 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1 E1011 10:46:39.096391 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1

Vanish1011 commented 12 months ago

sorry for font format , when i try to loading checksum_repository_agent into the Triton Inference Server that is successfully. But when i try to use it(poll models by s3) , It's going to be wrong. : E1011 10:46:33.018474 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1

chenyufeifei commented 12 months ago

I was able to build and add the repository agent successfully.

A couple notes:

  1. repository agents are only loaded if a model is found with a corresponding section in its config.pb.txt (https://github.com/triton-inference-server/checksum_repository_agent/tree/main#using-the-checksum-repository-agent)

This is described here:

https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/repository_agents.md#implementing-a-repository-agent

  1. I found that the table for repository agents was only updated on a successful model load. That is if the checksum was configured but had a incorrect hash or pointed to a missing file - the model didn't load and the repository agent wasn't listed (even though it had been used to prevent the loading of the model).

Can you confirm you've added the sections to your model config when adding the checksum repoagent?

Thank you, the issue has been resolved.

nnshah1 commented 12 months ago

@Vanish1011 This seems to be a different issue than the original question with regards to loading and using the checksum_repository_agent. Can you open a new ticket?

nnshah1 commented 12 months ago

Original question completed - separate issue (https://github.com/triton-inference-server/server/issues/6346#issuecomment-1757469895) with regards to interaction between s3 model repository and checksum repository agent to be filed separately (@Vanish1011).

Vanish1011 commented 12 months ago

@nnshah1 thanks, please Check out this :https://github.com/triton-inference-server/server/issues/6420