Closed jiuzhangsy closed 1 month ago
This example i s a great starting point, and you can set tensor_parallel_size
for multiple GPUs inference. For example:
llm = vllm.LLM(
MODEL_PATH,
enable_lora=True,
max_num_seqs=16,
max_loras=2,
trust_remote_code=True,
gpu_memory_utilization=0.3,
tensor_parallel_size=4,
)
🚀 The feature, motivation and pitch
I need to infer using vLLM across multiple GPUs, and manage multiple LoRA.Can anyone help? Thanks very much
Alternatives
No response
Additional context
No response