Open xxyux opened 6 months ago
It seems like I can not do cp tensorrt_llm/build/lib/tensorrt_llm/libs/* /opt/tritonserver/backends/tensorrtllm/
this command.
cause error happened in opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so
.
am I right?
@byshiue pls.
Both git branch is on main:
root@d0b11d0dea8b:/tensorrtllm_backend# git branch -av
* main 6e6e34e Update TensorRT-LLM backend (#272)
remotes/origin/HEAD -> origin/main
remotes/origin/fpetrini-cli-dev fda8635 Don't uninstall trt_llm
remotes/origin/fpetrini-triton-metrics 226c3c0 Updated gen script
remotes/origin/kaiyu/update-rel 9edd83a Update version.txt
remotes/origin/krish-fix-test 922b0e1 Fix test
remotes/origin/krish-trtllm-size 98c0a5f Fix up
remotes/origin/main 6e6e34e Update TensorRT-LLM backend (#272)
remotes/origin/r23.12 9aedcf3 Update TensorRT-LLM backend (#241)
remotes/origin/rel 4344654 Update TensorRT-LLM backend release branch (#260)
remotes/origin/release/0.5.0 47b609b Update doc (#78)
root@d0b11d0dea8b:/tensorrtllm_backend# cd tensorrt_llm/
root@d0b11d0dea8b:/tensorrtllm_backend/tensorrt_llm# git branch -av
* main 6cc5e17 Update issue templates
remotes/origin/HEAD -> origin/main
remotes/origin/gh-pages 0a75cdb Update gh-pages (#750)
remotes/origin/main 6cc5e17 Update issue templates
remotes/origin/rel 2f169d1 Add batch manager static lib for Windows (#814)
remotes/origin/release/0.5.0 a21e2f8 Fix an issue of
The tensorrt version of nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3
is v0.7.0, so you will encounter such issue when you build engine with the v0.7.1. I sugges using the docker file to build docker image again to make sure your tritonserver docker also install v0.7.1.
The tensorrt version of
nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3
is v0.7.0, so you will encounter such issue when you build engine with the v0.7.1. I sugges using the docker file to build docker image again to make sure your tritonserver docker also install v0.7.1.
THX sir!!!
So, I should use this command to build image, which tensrrt_llm version is v0.7.1.
# Update the submodules
cd tensorrtllm_backend
git lfs install
git submodule update --init --recursive
# Use the Dockerfile to build the backend in a container
# For x86_64
DOCKER_BUILDKIT=1 docker build -t triton_trt_llm -f dockerfile/Dockerfile.trt_llm_backend .
and follow the remaining steps(from to step5 [build tensorrt_llm]) in my issue?
@byshiue pls
Yes.
Thanks!
I installed tensorrtllm_backend successfully using the image built in this command
DOCKER_BUILDKIT=1 docker build -t triton_trt_llm -f dockerfile/Dockerfile.trt_llm_backend .
After launched server, I test in the follow ways described in this doc.
# Ask:
curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2}'
# Answer:
{"cum_log_probs":0.0,"model_name":"ensemble","model_version":"1","output_log_probs":[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],"sequence_end":false,"sequence_id":0,"sequence_start":false,"text_output":"\nMachine learning is a type of artificial intelligence (AI) that allows software applications to become more accurate"}
root@ps:/tensorrtllm_backend# export HF_LLAMA_MODEL=/data/llama/Llama-2-7b-hf/
root@ps:/tensorrtllm_backend# python3 inflight_batcher_llm/client/inflight_batcher_llm_client.py --request-output-len 200 --tokenizer-dir ${HF_LLAMA_MODEL}
=========
Input sequence: [1, 19298, 297, 6641, 29899, 23027, 3444, 29892, 1105, 7598, 16370, 408, 263]
Got completed request
Input: Born in north-east France, Soyer trained as a
Output beam 0: . He was a member of the Société des Artistes Français and exhibited at the Paris Salon from 1861. He was also a member of the Société des Artistes Indépendants.
Soyer was a painter of genre scenes, portraits and landscapes. He was also a lithographer and etcher.
Soyer was a friend of the composer Hector Berlioz and the writer Victor Hugo.
Soyer died in Paris in 1907.
The artist's works can be found in the collections of the Musée d'Orsay in Paris, the Musée des Beaux-Arts in Nancy, the Musée des Beaux-Arts in Rouen, the Musée des Beaux-Arts in Reims, the Musée des Beaux-Arts in Lille, the Musée des Beaux-Arts in Le
Output sequence: [23187, 472, 278, 3067, 10936, 553, 1522, 2993, 29899, 1433, 1372, 297, 3681, 29889, 940, 471, 263, 4509, 310, 278, 21903, 553, 3012, 9230, 1352, 6899, 322, 10371, 1573, 472, 278, 3681, 3956, 265, 515, 29871, 29896, 29947, 29953, 29896, 29889, 940, 471, 884, 263, 4509, 310, 278, 21903, 553, 3012, 9230, 1894, 6430, 355, 1934, 29889, 13, 6295, 7598, 471, 263, 23187, 310, 16151, 20407, 29892, 2011, 336, 1169, 322, 2982, 1557, 11603, 29889, 940, 471, 884, 263, 301, 389, 1946, 261, 322, 634, 4630, 29889, 13, 6295, 7598, 471, 263, 5121, 310, 278, 18422, 379, 3019, 2292, 492, 2112, 322, 278, 9227, 12684, 20650, 29889, 13, 6295, 7598, 6423, 297, 3681, 297, 29871, 29896, 29929, 29900, 29955, 29889, 13, 1576, 7664, 29915, 29879, 1736, 508, 367, 1476, 297, 278, 16250, 310, 278, 26273, 270, 29915, 29949, 2288, 388, 297, 3681, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 24190, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 15915, 264, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 830, 9893, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 365, 1924, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 951]
Exception ignored in: <function InferenceServerClient.__del__ at 0x7fd52f563370>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 257, in __del__
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 265, in close
File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 2101, in close
File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 2082, in _close
AttributeError: 'NoneType' object has no attribute 'StatusCode'
but there has been an AttributeError: 'NoneType' object has no attribute 'StatusCode'
.
What causes this error? How to solve it?
@byshiue plss
but there has been an AttributeError: 'NoneType' object has no attribute 'StatusCode'.
@xxyux It's very likely that the issue is in one of the dependencies of TensorRT-LLM backend. I tried pip3 install -r requirements.txt
to update the dependencies, and the issue is gone. Could you please try that as well?
but there has been an AttributeError: 'NoneType' object has no attribute 'StatusCode'.
@xxyux It's very likely that the issue is in one of the dependencies of TensorRT-LLM backend. I tried
pip3 install -r requirements.txt
to update the dependencies, and the issue is gone. Could you please try that as well?
hello , ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. tensorrt-llm 0.7.0 requires transformers==4.33.1, but you have transformers 4.31.0 which is incompatible. And the eroor AttributeError: 'NoneType' object has no attribute 'StatusCode' also happen
The same error( 'NoneType' object has no attribute 'StatusCode'. )got when I test inflight_batcher_llm_client.py. I tried "pip3 install -r requirements.txt" but useless tensorrt 9.2.0.post12.dev5 tensorrt-llm 0.8.0 torch 2.1.2 transformers 4.36.1 triton 2.1.0
root@acce067401db:/home/zy/data8tb/zx/tensorrtllm_backend# python3 inflight_batcher_llm/client/inflight_batcher_llm_client.py --request-output-len 200 --tokenizer-dir ${HF_LLAMA_MODEL}
=========
Input sequence: [1, 19298, 297, 6641, 29899, 23027, 3444, 29892, 1105, 7598, 16370, 408, 263]
Got completed request
Input: Born in north-east France, Soyer trained as a
Output beam 0: . He was a member of the Société des Artistes Français and exhibited at the Paris Salon from 1861. He was also a member of the Société des Artistes Indépendants.
Soyer was a painter of genre scenes, portraits and landscapes. He was also a lithographer and etcher.
Soyer was a friend of the composer Hector Berlioz and the writer Victor Hugo.
Soyer died in Paris in 1907.
The artist's works can be found in the collections of the Musée d'Orsay in Paris, the Musée des Beaux-Arts in Nancy, the Musée des Beaux-Arts in Rouen, the Musée des Beaux-Arts in Reims, the Musée des Beaux-Ar ts in Lille, the Musée des Beaux-Arts in Le
Output sequence: [23187, 472, 278, 3067, 10936, 553, 1522, 2993, 29899, 1433, 1372, 297, 3681, 29889, 940, 471, 263, 4509, 310, 278, 21903, 553, 3012, 9230, 1352, 6899, 322, 10371, 1573, 472, 278, 3681, 3956, 265, 515, 29871, 29896, 29947, 29953, 29896, 29889, 940, 471, 884, 263, 4509, 310, 278, 21903, 553, 3012, 9230, 1894, 6430, 355, 1934, 29889, 13, 6295, 7598, 471, 263, 23187, 310, 16151, 20407, 29892, 2011, 3 36, 1169, 322, 2982, 1557, 11603, 29889, 940, 471, 884, 263, 301, 389, 1946, 261, 322, 634, 4630, 29889, 13, 6295, 7598, 471, 263, 5121, 310, 278, 18422, 379, 3019, 2292, 492, 2112, 322, 278, 9227, 12684, 2065 0, 29889, 13, 6295, 7598, 6423, 297, 3681, 297, 29871, 29896, 29929, 29900, 29955, 29889, 13, 1576, 7664, 29915, 29879, 1736, 508, 367, 1476, 297, 278, 16250, 310, 278, 26273, 270, 29915, 29949, 2288, 388, 297 , 3681, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 24190, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 15915, 264, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 83 0, 9893, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 365, 1924, 29892, 278, 26273, 553, 1522, 2993, 29899, 1433, 1372, 297, 951]
Exception ignored in: <function InferenceServerClient.__del__ at 0x7f40ac90b370>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 257, in __del__
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 265, in close
File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 2181, in close
File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 2162, in _close
AttributeError: 'NoneType' object has no attribute 'StatusCode'
Same error. Triton: nvcr.io/nvidia/tritonserver:24.02-trtllm-python-py3 tensorrtllm_backend: v0.8.0 Mixtral-8x7b
same error +1
same error +1 Tensorrtllm_backend: v0.8.0 model: llama7b
same error +1 Tensorrtllm_backend: v0.8.0 model: llama7b
do not init client, use with client
I installed tensorrtllm_backend in the follow way:
docker pull nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3
docker run -v /data2/share/:/data/ -v /mnt/sdb/benchmark/xiangrui:/root -it -d --cap-add=SYS_PTRACE --cap-add=SYS_ADMIN --security-opt seccomp=unconfined --gpus=all --shm-size=16g --privileged --ulimit memlock=-1 --name=develop nvcr.io/nvidia/tritonserver:23.12-trtllm-python-py3 bash
Launch server
, I meet this issue https://github.com/NVIDIA/TensorRT-LLM/issues/656 ,Assertion failed: d == a + length (/app/tensorrt_llm/cpp/tensorrt_llm/plugins/gptAttentionCommon/gptAttentionCommon.cpp:326)
then, I do thiscp tensorrt_llm/build/lib/tensorrt_llm/libs/* /opt/tritonserver/backends/tensorrtllm/
, solved problem.python3 scripts/launch_triton_server.py --world_size 1 --model_repo=triton_model_repo/
messages shown blow:+-------------+-------------------------------------------------------------+-------------------------------------------------------------+
I0107 12:18:03.565860 4223 server.cc:676] +------------------+---------+--------+ | Model | Version | Status | +------------------+---------+--------+ | ensemble | 1 | READY | | postprocessing | 1 | READY | | preprocessing | 1 | READY | | tensorrt_llm | 1 | READY | | tensorrt_llm_bls | 1 | READY | +------------------+---------+--------+
I0107 12:18:03.712204 4223 metrics.cc:817] Collecting metrics for GPU 0: NVIDIA RTX A6000 I0107 12:18:03.726368 4223 metrics.cc:710] Collecting CPU metrics I0107 12:18:03.726577 4223 tritonserver.cc:2483] +----------------------------------+------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.41.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_c | | | onfiguration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace | | | logging | | model_repository_path[0] | triton_model_repo/ | | model_control_mode | MODE_NONE | | strict_model_config | 1 | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | | cache_enabled | 0 | +----------------------------------+------------------------------------------------------------------------------------------------------+
I0107 12:18:03.744520 4223 grpc_server.cc:2495] Started GRPCInferenceService at 0.0.0.0:8001 I0107 12:18:03.744823 4223 http_server.cc:4619] Started HTTPService at 0.0.0.0:8000 I0107 12:18:03.804746 4223 http_server.cc:282] Started Metrics Service at 0.0.0.0:8002
root@d0b11d0dea8b:/tensorrtllm_backend# curl -X POST localhost:8000/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": "", "pad_id": 2, "end_id": 2}' [TensorRT-LLM][ERROR] Encountered an error in forward function: Input tensor 'host_sink_token_length' not found; expected shape: (1) (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:124) 1 0x7f830f5793e3 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1273e3) [0x7f830f5793e3] 2 0x7f830f4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f830f4cbeb1] 3 0x7f830f4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f830f4ccfa6] 4 0x7f830f4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f830f4d0f0d] 5 0x7f830f4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f830f4bba28] 6 0x7f830f4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f830f4bffb5] 7 0x7f838604f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f838604f253] 8 0x7f8385ddfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f8385ddfac3] 9 0x7f8385e71660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f8385e71660] [TensorRT-LLM][ERROR] Encountered error for requestId 1804289384: Encountered an error in forward function: Input tensor 'host_sink_token_length' not found; expected shape: (1) (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:124) 1 0x7f830f5793e3 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1273e3) [0x7f830f5793e3] 2 0x7f830f4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f830f4cbeb1] 3 0x7f830f4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f830f4ccfa6] 4 0x7f830f4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f830f4d0f0d] 5 0x7f830f4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f830f4bba28] 6 0x7f830f4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f830f4bffb5] 7 0x7f838604f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f838604f253] 8 0x7f8385ddfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f8385ddfac3] 9 0x7f8385e71660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f8385e71660] [TensorRT-LLM][WARNING] Step function failed, continuing. {"error":"in ensemble 'ensemble', Encountered error for requestId 1804289384: Encountered an error in forward function: Input tensor 'host_sink_token_length' not found; expected shape: (1) (/app/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:124)\n1 0x7f830f5793e3 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x1273e3) [0x7f830f5793e3]\n2 0x7f830f4cbeb1 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x79eb1) [0x7f830f4cbeb1]\n3 0x7f830f4ccfa6 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7afa6) [0x7f830f4ccfa6]\n4 0x7f830f4d0f0d /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x7ef0d) [0x7f830f4d0f0d]\n5 0x7f830f4bba28 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x69a28) [0x7f830f4bba28]\n6 0x7f830f4bffb5 /opt/tritonserver/backends/tensorrtllm/libtriton_tensorrtllm.so(+0x6dfb5) [0x7f830f4bffb5]\n7 0x7f838604f253 /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7f838604f253]\n8 0x7f8385ddfac3 /lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7f8385ddfac3]\n9 0x7f8385e71660 /lib/x86_64-linux-gnu/libc.so.6(+0x126660) [0x7f8385e71660]"}root@d0b11d0dea8b:/tensorrtllm_backend# root@d0b11d0dea8b:/tensorrtllm_backend# tmux a