Closed chenyufeifei closed 12 months ago
@nnshah1 We are recently writing a plugin to speed up model downloads. Can you help us look at this problem?
@gaius-qi - Will look into the issue and respond today.
@gaius-qi - Will look into the issue and respond today.
Thanks
I was able to build and add the repository agent successfully.
A couple notes:
1) repository agents are only loaded if a model is found with a corresponding section in its config.pb.txt (https://github.com/triton-inference-server/checksum_repository_agent/tree/main#using-the-checksum-repository-agent)
This is described here:
2) I found that the table for repository agents was only updated on a successful model load. That is if the checksum was configured but had a incorrect hash or pointed to a missing file - the model didn't load and the repository agent wasn't listed (even though it had been used to prevent the loading of the model).
Can you confirm you've added the sections to your model config when adding the checksum repoagent?
NVIDIA Release 21.09 (build 27443074)
Copyright (c) 2018-2021, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use Docker with NVIDIA Container Toolkit to start this container; see https://github.com/NVIDIA/nvidia-docker.
I1011 10:46:29.450213 1 libtorch.cc:1030] TRITONBACKEND_Initialize: pytorch I1011 10:46:29.450536 1 libtorch.cc:1040] Triton TRITONBACKEND API version: 1.5 I1011 10:46:29.450548 1 libtorch.cc:1046] 'pytorch' TRITONBACKEND API version: 1.5 2023-10-11 10:46:29.637876: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 I1011 10:46:29.784232 1 tensorflow.cc:2170] TRITONBACKEND_Initialize: tensorflow I1011 10:46:29.784259 1 tensorflow.cc:2180] Triton TRITONBACKEND API version: 1.5 I1011 10:46:29.784625 1 tensorflow.cc:2186] 'tensorflow' TRITONBACKEND API version: 1.5 I1011 10:46:29.784649 1 tensorflow.cc:2210] backend configuration: {} I1011 10:46:29.799076 1 onnxruntime.cc:1997] TRITONBACKEND_Initialize: onnxruntime I1011 10:46:29.799102 1 onnxruntime.cc:2007] Triton TRITONBACKEND API version: 1.5 I1011 10:46:29.799324 1 onnxruntime.cc:2013] 'onnxruntime' TRITONBACKEND API version: 1.5 I1011 10:46:29.840653 1 openvino.cc:1193] TRITONBACKEND_Initialize: openvino I1011 10:46:29.840679 1 openvino.cc:1203] Triton TRITONBACKEND API version: 1.5 I1011 10:46:29.840681 1 openvino.cc:1209] 'openvino' TRITONBACKEND API version: 1.5 W1011 10:46:29.841642 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version I1011 10:46:29.841978 1 cuda_memory_manager.cc:115] CUDA memory pool disabled E1011 10:46:29.884081 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1 I1011 10:46:29.884126 1 server.cc:519] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+
I1011 10:46:29.884159 1 server.cc:546] +-------------+-----------------------------------------------------------------+--------+ | Backend | Path | Config | +-------------+-----------------------------------------------------------------+--------+ | pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {} | | tensorflow | /opt/tritonserver/backends/tensorflow1/libtriton_tensorflow1.so | {} | | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {} | | openvino | /opt/tritonserver/backends/openvino/libtriton_openvino.so | {} | +-------------+-----------------------------------------------------------------+--------+
I1011 10:46:29.884184 1 server.cc:589] +-------+---------+--------+ | Model | Version | Status | +-------+---------+--------+ +-------+---------+--------+
I1011 10:46:29.884260 1 tritonserver.cc:1836] +----------------------------------+----------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.14.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration sys | | | tem_shared_memory cuda_shared_memory binary_tensor_data statistics | | model_repository_path[0] | s3://172.17.0.2:9000/models | | model_control_mode | MODE_POLL | | strict_model_config | 1 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------+
E1011 10:46:29.884317 1 tritonserver.cc:1845] Internal: failed to load all models I1011 10:46:29.892605 1 grpc_server.cc:4111] Started GRPCInferenceService at 0.0.0.0:8001 I1011 10:46:29.892740 1 http_server.cc:2803] Started HTTPService at 0.0.0.0:8000 I1011 10:46:29.937924 1 http_server.cc:162] Started Metrics Service at 0.0.0.0:8002 E1011 10:46:29.974866 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1 E1011 10:46:33.018474 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1 E1011 10:46:36.054908 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1 E1011 10:46:39.096391 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1
sorry for font format , when i try to loading checksum_repository_agent into the Triton Inference Server that is successfully. But when i try to use it(poll models by s3) , It's going to be wrong. : E1011 10:46:33.018474 1 model_repository_manager.cc:1890] Poll failed for model directory 'simple': Unsupported filesystem: 1
I was able to build and add the repository agent successfully.
A couple notes:
- repository agents are only loaded if a model is found with a corresponding section in its config.pb.txt (https://github.com/triton-inference-server/checksum_repository_agent/tree/main#using-the-checksum-repository-agent)
This is described here:
- I found that the table for repository agents was only updated on a successful model load. That is if the checksum was configured but had a incorrect hash or pointed to a missing file - the model didn't load and the repository agent wasn't listed (even though it had been used to prevent the loading of the model).
Can you confirm you've added the sections to your model config when adding the checksum repoagent?
Thank you, the issue has been resolved.
@Vanish1011 This seems to be a different issue than the original question with regards to loading and using the checksum_repository_agent
. Can you open a new ticket?
Original question completed - separate issue (https://github.com/triton-inference-server/server/issues/6346#issuecomment-1757469895) with regards to interaction between s3 model repository and checksum repository agent to be filed separately (@Vanish1011).
@nnshah1 thanks, please Check out this :https://github.com/triton-inference-server/server/issues/6420
Description I want to try loading checksum_repository_agent into the Triton Inference Server, but the Repository Agent is empty when starting.
Triton Information nvcr.io/nvidia/tritonserver:23.08-py3
Are you using the Triton container or did you build it yourself?
To Reproduce 1、build the checksum repository agent
generate the
libtritonrepoagent_checksum.so
2、Placelibtritonrepoagent_checksum.so
in the<repository_agent_directory>/agent_repository/checksum
directory. 3、rundocker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v <moedel_directory>/model_repository:/models -v /<repository_agent_directory>/agent_repository:/opt/tritonserver/repoagents nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models
The program runs successfully, but the Repository Agent is empty.
Expected behavior Checksum agent loaded successfully.