Closed Karthikreddyk99 closed 4 days ago
In SGLang, you can enable multi-GPU tensor parallelism by adding add --tp 2 when starting the server. Please see the documentation. Otherwise you should be able to set device_map='auto'
and define the gpus in the environment using CUDA_VISIBLE_DEVICES
.
We tried multiple ways of running the model on multi GPUs, but it will only use single GPU.
Can you provide us the correct command to run the model on multi GPUs