Closed hxujal closed 5 months ago
I couldn't load the model using 1 gpu
set --tensor-parallel-size 2
How do I specify the gpu to run? For example, only cuda:0 or cuda:1 can be selected
@peacefulluo FYI https://github.com/vllm-project/vllm/issues/2387
@peacefulluo FYI #2387 thank you
set
--tensor-parallel-size 2
Hi, I'd like to know what kind of advantage it will bring if I run my LLM on 2 GPUs? Faster speed or it just split the model into 2 parts onto different devices?
Your current environment
How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.