Open victorsoda opened 4 months ago
In tensorrt_llm_backend, when we launch several server by MPI with world_size > 1, only the rank 0 (main process) will recieve/return requests. Other ranks will skip this step and will not encounter issue of same port. So, you need to do similar thing if you want to use self-defined backend.
Any clue how to resolve this issue, please let me know?
i meet the same error, any solutions?
I used world size 4 and it worked
From: dwq370 @.> Sent: Friday, July 5, 2024 7:24:18 AM To: triton-inference-server/tensorrtllm_backend @.> Cc: Alok Kumar Sahu @.>; Comment @.> Subject: Re: [triton-inference-server/tensorrtllm_backend] [tensorrt-llm backend] A question about launch_triton_server.py (Issue #455)
i meet the same error, any solutions?
— Reply to this email directly, view it on GitHubhttps://github.com/triton-inference-server/tensorrtllm_backend/issues/455#issuecomment-2210258189, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMDHPKDFWEK4PCGPINTGYPTZKY3ZFAVCNFSM6AAAAABHWSGVBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJQGI2TQMJYHE. You are receiving this because you commented.Message ID: @.***>
i used world size 4 but it not worked, world size 2 worked
Okay
From: dwq370 @.> Sent: Friday, July 5, 2024 8:18:55 AM To: triton-inference-server/tensorrtllm_backend @.> Cc: Alok Kumar Sahu @.>; Comment @.> Subject: Re: [triton-inference-server/tensorrtllm_backend] [tensorrt-llm backend] A question about launch_triton_server.py (Issue #455)
i used world size 4 but it not worked, world size 2 worked
— Reply to this email directly, view it on GitHubhttps://github.com/triton-inference-server/tensorrtllm_backend/issues/455#issuecomment-2210333048, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AMDHPKHRABVICFWNH4MWON3ZKZCF7AVCNFSM6AAAAABHWSGVBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJQGMZTGMBUHA. You are receiving this because you commented.Message ID: @.***>
In tensorrt_llm_backend, when we launch several server by MPI with world_size > 1, only the rank 0 (main process) will recieve/return requests. Other ranks will skip this step and will not encounter issue of same port. So, you need to do similar thing if you want to use self-defined backend.
Any examples? We have the same problem. We need to run trtllm in the python backend with tp_size > 1 for VLM model.
Question
The codes in launch_triton_server.py:
When world_size = 2 for example, 2 triton servers will be launched using the same grpc port (e.g., 8001). But how could this be possible? When I tried to do something similar, I got the following error while launching the second server:
Background
I've been developing my triton backend drawing on the experience of https://github.com/triton-inference-server/tensorrtllm_backend.
I have already built two engines (tensor parallel, tp_size = 2) of the llama2-7b model. It's ok to run something like
mpirun -np 2 python3.8 run.py
to load the two engines, run tensor-parallel inference, and get the correct results.My goal now is to run the same two engines by the triton server.
I have already implemented the
run.py
logic in the model.py (initialize() and execute() functions) in my python backend.Following launch_triton_server.py, I tried the following command line:
Then I got the error as above.
Could you please tell me what I did wrong and how I can fix the error? Thanks a lot!