Closed ZX-ModelCloud closed 3 months ago
As far as I know, RTX 40 series is the only modern Nvidia gpu that do not support P2P. Shame on Nvidia. Attempting to force use them will cause errors.
I stumbled across this issue and used the following dirty workaround. It won't give you p2p and thus degrades performance, but the model will still be split across the cards.
~~1. Comment out the line where monkey_patch_vllm_p2p_access_check()
is called in python/sglang/srt/managers/controller/model_runner.py
export NCCL_IGNORE_DISABLED_P2P=1
as described in https://github.com/vllm-project/vllm/issues/406My bad, didn't realize that I'm looking at a pull request instead of an issue.
Alternatively p2p can be enabled for 4090 GPUs with this fork of the gpu kernel modules (have not tried it yet): https://github.com/tinygrad/open-gpu-kernel-modules
Alternatively p2p can be enabled for 4090 GPUs with this fork of the gpu kernel modules (have not tried it yet): https://github.com/tinygrad/open-gpu-kernel-modules
It works! Tested the tinycorp nvidia driver and nccl/p2p works for 4090 (albeit slow) .
Stacktrace: