triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

FasterTransformer freezes on 4 GPUs while running GPT with NCCL_LAUNCH_MODE=GROUP #27

Closed saramcallister closed 2 years ago

saramcallister commented 2 years ago

I'm running triton inference server on a server with 4 GPUs (no pipeline parallelism). Following the GPT guide, I can run inference with tensor parallelism = 2 (so only using 2 of the GPUs). However, if I follow the same steps but instead run with 4 GPUs in tensor parallelism, any single inference I run freezes similar to in https://github.com/triton-inference-server/fastertransformer_backend/issues/19, but when I'm running with NCCL_LAUNCH_MODE is GROUP or PARALLEL.

The GPUs will also show full utilization (according to nvidia-smi) until I kill the container, potentially even hours later, long after the timeout window. The server doesn't think any requests are in flight at the time for any models though.

logs:

W0706 20:19:47.394019 877 libfastertransformer.cc:999] before ThreadForward 0
W0706 20:19:47.394158 877 libfastertransformer.cc:1006] after ThreadForward 0
W0706 20:19:47.394177 877 libfastertransformer.cc:999] before ThreadForward 1
W0706 20:19:47.394287 877 libfastertransformer.cc:1006] after ThreadForward 1
W0706 20:19:47.394303 877 libfastertransformer.cc:999] before ThreadForward 2
I0706 20:19:47.394317 877 libfastertransformer.cc:834] Start to forward
I0706 20:19:47.394388 877 libfastertransformer.cc:834] Start to forward
W0706 20:19:47.394424 877 libfastertransformer.cc:1006] after ThreadForward 2
W0706 20:19:47.394444 877 libfastertransformer.cc:999] before ThreadForward 3
I0706 20:19:47.394530 877 libfastertransformer.cc:834] Start to forward
W0706 20:19:47.394565 877 libfastertransformer.cc:1006] after ThreadForward 3
I0706 20:19:47.394651 877 libfastertransformer.cc:834] Start to forward  
byshiue commented 2 years ago

Please follow the template of bug_report.yml to provide the reproduced steps, thanks.

saramcallister commented 2 years ago

Closing, bug report in https://github.com/triton-inference-server/fastertransformer_backend/issues/28.