Closed Mrzhiyao closed 10 months ago
Did you use the --gpu
params when starting docker
This problem was solved after I restarted the container, but a new error occurred when executing the program.
Traceback (most recent call last):
File "/home/eg/PycharmProjects/Towhee/triton_endcod.py", line 8, in
Did you use the
--gpu
params when starting docker
Yes, the problem was solved after I recreated the container, but a new problem appeared. Do you know how to solve this problem?
It seems that access to the triton server timeout. Are there any logs on the server?
It seems that access to the triton server timeout. Are there any logs on the server?
docker logs shows that:
NVIDIA Release 22.07 (build 41737377) Triton Server Version 2.24.0
Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
I1109 06:53:09.532688 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6a4e000000' with size 268435456 I1109 06:53:09.533016 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864 I1109 06:53:09.536004 1 model_repository_manager.cc:1206] loading: pipeline:1 I1109 06:53:09.536049 1 model_repository_manager.cc:1206] loading: sentence-embedding.sbert-0:1 /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " I1109 06:53:11.225232 1 onnxruntime.cc:2458] TRITONBACKEND_Initialize: onnxruntime I1109 06:53:11.225295 1 onnxruntime.cc:2468] Triton TRITONBACKEND API version: 1.10 I1109 06:53:11.225317 1 onnxruntime.cc:2474] 'onnxruntime' TRITONBACKEND API version: 1.10 I1109 06:53:11.225331 1 onnxruntime.cc:2504] backend configuration: {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} I1109 06:53:11.259270 1 onnxruntime.cc:2560] TRITONBACKEND_ModelInitialize: sentence-embedding.sbert-0 (version 1) W1109 06:53:14.630221 1 onnxruntime.cc:787] autofilled max_batch_size to 4 for model 'sentence-embedding.sbert-0' since batching is supporrted but no max_batch_size is specified in model configuration. Must specify max_batch_size to utilize autofill with a larger max batch size I1109 06:53:14.685000 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_0 (CPU device 0) /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " I1109 06:53:17.996107 1 onnxruntime.cc:2603] TRITONBACKEND_ModelInstanceInitialize: sentence-embedding.sbert-0_0 (GPU device 0) I1109 06:53:20.312004 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_1 (CPU device 0) I1109 06:53:20.312255 1 model_repository_manager.cc:1352] successfully loaded 'sentence-embedding.sbert-0' version 1 /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " I1109 06:53:23.568245 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_2 (CPU device 0) /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " I1109 06:53:26.839855 1 python_be.cc:1767] TRITONBACKEND_ModelInstanceInitialize: pipeline_0_3 (CPU device 0) /usr/lib/python3/dist-packages/requests/init.py:89: RequestsDependencyWarning: urllib3 (2.0.7) or chardet (3.0.4) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported " I1109 06:53:30.081773 1 model_repository_manager.cc:1352] successfully loaded 'pipeline' version 1 I1109 06:53:30.082043 1 server.cc:559] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+
I1109 06:53:30.082215 1 server.cc:586]
+-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+ | python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/b | | | | ackends","default-max-batch-size":"4"}} | | onnxruntime | /opt/tritonserver/backends/onnxruntime/libtriton_onnxruntime.so | {"cmdline":{"auto-complete-config":"true","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/b | | | | ackends","default-max-batch-size":"4"}} | +-------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------+
I1109 06:53:30.082348 1 server.cc:629] +----------------------------+---------+--------+ | Model | Version | Status | +----------------------------+---------+--------+ | pipeline | 1 | READY | | sentence-embedding.sbert-0 | 1 | READY | +----------------------------+---------+--------+
I1109 06:53:30.135753 1 metrics.cc:650] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3090 I1109 06:53:30.136027 1 tritonserver.cc:2176] I1109 06:53:30.137643 1 grpc_server.cc:4608] Started GRPCInferenceService at 0.0.0.0:8001 I1109 06:53:30.137940 1 http_server.cc:3312] Started HTTPService at 0.0.0.0:8000 I1109 06:53:30.179419 1 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002
curl http://0.0.0.0:8000/v2/models/stats Check the server is available
curl http://0.0.0.0:8000/v2/models/stats Check the server is available
I set the local port to 8010. So I can get such a result, what may be the cause of the error in this case, thank you for your help.
(base) eg@eg-HP-Z8-G4-Workstation:~$ curl http://0.0.0.0:8010/v2/models/stats {"model_stats":[{"name":"pipeline","version":"1","last_inference":0,"inference_count":0,"execution_count":0,"inference_stats":{"success":{"count":0,"ns":0},"fail":{"count":0,"ns":0},"queue":{"count":0,"ns":0},"compute_input":{"count":0,"ns":0},"compute_infer":{"count":0,"ns":0},"compute_output":{"count":0,"ns":0},"cache_hit":{"count":0,"ns":0},"cache_miss":{"count":0,"ns":0}},"batch_stats":[]},{"name":"sentence-embedding.sbert-0","version":"1","last_inference":0,"inference_count":0,"execution_count":0,"inference_stats":{"success":{"count":0,"ns":0},"fail":{"count":0,"ns":0},"queue":{"count":0,"ns":0},"compute_input":{"count":0,"ns":0},"compute_infer":{"count":0,"ns":0},"compute_output":{"count":0,"ns":0},"cache_hit":{"count":0,"ns":0},"cache_miss":{"count":0,"ns":0}},"batch_stats":[]}]}
Try ops.sentence_embedding.transformers
, sbert
has some bugs.
This pipeline works fine.
Try
ops.sentence_embedding.transformers
,sbert
has some bugs. This pipeline works fine.
Thank you for your help. I think my problem has been resolved. My other question is, which parameters can further improve the encoding speed by accelerating model inference through the Triton server in parameter settings.
It is possible to optimize performance by adjusting parameters such as the number of instances and batch size. For more information, please refer to the Triton documentation: https://github.com/triton-inference-server/server
It is possible to optimize performance by adjusting parameters such as the number of instances and batch size. For more information, please refer to the Triton documentation: https://github.com/triton-inference-server/server
Thank you very much for your help. I think my problem has been resolved.
Is there an existing issue for this?
Current Behavior
root@85d70c862b32:/opt/tritonserver# tritonserver --model-repository
pwd
/models W1109 05:31:06.568839 124 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version I1109 05:31:06.568981 124 cuda_memory_manager.cc:115] CUDA memory pool disabled I1109 05:31:06.569292 124 tritonserver.cc:2176] +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.24.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tens | | | or_data statistics trace | | model_repository_path[0] | /opt/tritonserver/models | | model_control_mode | MODE_NONE | | strict_model_config | 0 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | response_cache_byte_size | 0 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------+I1109 05:31:06.569348 124 server.cc:257] No server context available. Exiting immediately. error: creating server: Internal - failed to stat file /opt/tritonserver/models
Expected Behavior
I'm following the official documentation to deploy triton server and start towhee to speed up coding.
I got an error in step “Start the Triton server”after entering the server. But I can use towhee for encoding in the local environment if I don't use the triton server method. It prompts whether the cuda driver and version in the error message is the reason why it cannot be executed. How can I continue the operation?
Steps To Reproduce
Environment
Anything else?
No response