Open duguwanglong opened 4 months ago
same problem
https://github.com/vllm-project/vllm/issues/1968. The solution to a former issue may apply to this as well.
Try unset TORCH_DISTRIBUTED_DEBUG
Reference in n No changed. Also has the issue. I use cmdline to infer model. [root@localhost ~]# unset TORCH_DISTRIBUTED_DEBUG I also make the TORCH_DISTRIBUTED_DEBUG==INFO or OFF or DETAIL, but it dosn`t matter.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
🐛 Describe the bug
model_param.update({"tensor_parallel_size": 4, "gpu_memory_utilization": 0.99})