Closed jimwu6 closed 2 years ago
What tensor_para_size
and pipeline_para_size
do you use? Have you set the NCCL_LAUNCH_MODE
as the guide suggesting?
tensor_para_size
is 4, pipeline_para_size
is 1; haven't tried the NCCL_LAUNCH_MODE
, will try it.
You can try both PARALLEL
and GROUP
.
GROUP
seems to solve this issue, PARALLEL
has the same issue (which makes sense because I believe PARALLEL
is default)
Running into an issue where after sending in a few requests in succession, FasterTransformer on Triton will lock up; the logs look like this
I've left this there for over an hour, and it will still hang. Interestingly, some number of GPUs will still show 100% GPU Util in
nvidia-smi
. It also is flaky, as it doesn't happen after the same number of requests each time. I am using 4 A100s.Happy to provide more information as needed.