triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
616 stars 86 forks source link

How to test Metrics port for baichuan1-13b? #169

Open Ajay-Wong opened 8 months ago

Ajay-Wong commented 8 months ago

I successfully deploy baichuan1 13b model, and it gives three ports. How can i test Metrics Service at 0.0.0.0:8002?

I1118 07:23:26.676895 28532 grpc_server.cc:2451] Started GRPCInferenceService at 0.0.0.0:8001 I1118 07:23:26.677120 28532 http_server.cc:3558] Started HTTPService at 0.0.0.0:8000 I1118 07:23:26.732376 28532 http_server.cc:187] Started Metrics Service at 0.0.0.0:8002

I use tools/inflight_batcher_llm directory scripts[end_to_end_test.py] to test 8002 port, but it failed.

python end_to_end_test.py --max_input_len 1024 --dataset /mnt/cephfs/workspace/speech/wangzhijian/TRTLLM/tensorrtllm_backend/testset/processed_343_CEVAL-Math.json --protocol grpc -u localhost:8002 [INFO] Start testing on 343 prompts. Traceback (most recent call last): File "/mnt/cephfs/workspace/speech/wangzhijian/TRTLLM/tensorrtllm_backend/tools/inflight_batcher_llm/end_to_end_test.py", line 233, in <module> test_functionality(client, prompts, output_lens) File "/mnt/cephfs/workspace/speech/wangzhijian/TRTLLM/tensorrtllm_backend/tools/inflight_batcher_llm/end_to_end_test.py", line 47, in test_functionality result = client.infer(model_name, inputs, request_id=str(i)) File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 1380, in infer raise_error_grpc(rpc_error) File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc raise get_error_grpc(rpc_error) from None tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:8002: Socket closed

I change port 8002 to 8001, It still fails. python end_to_end_test.py --max_input_len 1024 --dataset /mnt/cephfs/workspace/speech/wangzhijian/TRTLLM/tensorrtllm_backend/testset/processed_343_CEVAL-Math.json --protocol grpc -u localhost:8001 [INFO] Start testing on 343 prompts. Traceback (most recent call last): File "/mnt/cephfs/workspace/speech/wangzhijian/TRTLLM/tensorrtllm_backend/tools/inflight_batcher_llm/end_to_end_test.py", line 233, in <module> test_functionality(client, prompts, output_lens) File "/mnt/cephfs/workspace/speech/wangzhijian/TRTLLM/tensorrtllm_backend/tools/inflight_batcher_llm/end_to_end_test.py", line 59, in test_functionality result = client.infer(model_name, inputs, request_id=str(i)) File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 1380, in infer raise_error_grpc(rpc_error) File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_utils.py", line 77, in raise_error_grpc raise get_error_grpc(rpc_error) from None tritonclient.utils.InferenceServerException: [StatusCode.UNIMPLEMENTED] ModelInfer RPC doesn't support models with decoupled transaction policy

byshiue commented 8 months ago

If you want to setup port, you should also change the corresponding protocol.

For end_to_end example, you only need to add -i grpc then the request would be sent through grpc with port 8001.

Ajay-Wong commented 8 months ago

If you want to setup port, you should also change the corresponding protocol.

For end_to_end example, you only need to add -i grpc then the request would be sent through grpc with port 8001.

I've browsed through all directories but couldn't find any scripts to test the Metrics port. Which scripts can I use for testing?

byshiue commented 8 months ago

Here is document https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md, hoping it is helpful.

Ajay-Wong commented 8 months ago

Here is document https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md, hoping it is helpful.

thanks