vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
23.77k stars 3.41k forks source link

Add multi-LoRA support for more architectures #2602

Open Yard1 opened 6 months ago

Yard1 commented 6 months ago

Currently, multi-LoRA supports only Llama and Mistral architectures. We should extend this functionality to all architectures.

Yi, Qwen, Phi and Mixtral architectures seem to be the most demanded right now.

One challenge will be ensuring that all allowed weight shapes are supported by punica kernels. We may need to investigate some sort of padding there.

Originally posted by @Yard1 in https://github.com/vllm-project/vllm/issues/1804#issuecomment-1882913208

TaeWoo21 commented 6 months ago

Is it possible to add "GPT-NeoX" as well?

Cloopen-ReLiNK commented 6 months ago

how about "chatglm" ?

FurtherAI commented 5 months ago

@Yard1 I'm interested in extending this to other architectures, do you want to meet and talk about the problems that need to be solved to get it working?

nightflight-dk commented 3 months ago

+1, bump for Phi-3.0 (any Phi right now)

jjjjohnson commented 2 months ago

+1 for Qwen

jiauy commented 1 month ago

+1 for Qwen