Open Yard1 opened 6 months ago
Is it possible to add "GPT-NeoX" as well?
how about "chatglm" ?
@Yard1 I'm interested in extending this to other architectures, do you want to meet and talk about the problems that need to be solved to get it working?
+1, bump for Phi-3.0 (any Phi right now)
+1 for Qwen
+1 for Qwen
Currently, multi-LoRA supports only Llama and Mistral architectures. We should extend this functionality to all architectures.
Yi, Qwen, Phi and Mixtral architectures seem to be the most demanded right now.
One challenge will be ensuring that all allowed weight shapes are supported by punica kernels. We may need to investigate some sort of padding there.
Originally posted by @Yard1 in https://github.com/vllm-project/vllm/issues/1804#issuecomment-1882913208