FasterTransformer might freeze after few requests

jimwu6 commented 2 years ago

Running into an issue where after sending in a few requests in succession, FasterTransformer on Triton will lock up; the logs look like this

W0406 20:15:51.091033 52 libfastertransformer.cc:1092] the data is in CPU
W0406 20:15:51.091045 52 libfastertransformer.cc:1096] the data is in CPU
W0406 20:15:51.091054 52 libfastertransformer.cc:975] before ThreadForward 0
W0406 20:15:51.091200 52 libfastertransformer.cc:977] after ThreadForward 0
W0406 20:15:51.091220 52 libfastertransformer.cc:975] before ThreadForward 1
W0406 20:15:51.091274 52 libfastertransformer.cc:977] after ThreadForward 1
W0406 20:15:51.091288 52 libfastertransformer.cc:975] before ThreadForward 2
W0406 20:15:51.091357 52 libfastertransformer.cc:977] after ThreadForward 2
W0406 20:15:51.091363 52 libfastertransformer.cc:856] Start to forward
W0406 20:15:51.091366 52 libfastertransformer.cc:856] Start to forward
W0406 20:15:51.091374 52 libfastertransformer.cc:975] before ThreadForward 3
W0406 20:15:51.091407 52 libfastertransformer.cc:856] Start to forward
W0406 20:15:51.091517 52 libfastertransformer.cc:977] after ThreadForward 3
W0406 20:15:51.091583 52 libfastertransformer.cc:856] Start to forward

I've left this there for over an hour, and it will still hang. Interestingly, some number of GPUs will still show 100% GPU Util in nvidia-smi. It also is flaky, as it doesn't happen after the same number of requests each time. I am using 4 A100s.

Happy to provide more information as needed.

byshiue commented 2 years ago

What tensor_para_size and pipeline_para_size do you use? Have you set the NCCL_LAUNCH_MODE as the guide suggesting?

jimwu6 commented 2 years ago

tensor_para_size is 4, pipeline_para_size is 1; haven't tried the NCCL_LAUNCH_MODE, will try it.

byshiue commented 2 years ago

You can try both PARALLEL and GROUP.

jimwu6 commented 2 years ago

GROUP seems to solve this issue, PARALLEL has the same issue (which makes sense because I believe PARALLEL is default)

triton-inference-server / fastertransformer_backend

FasterTransformer might freeze after few requests #19