Closed kamillle closed 2 months ago
Hi @kamillle Thank you for your attention and valuable suggestions. Support for LoRA is in our roadmap, please stay tuned. https://github.com/sgl-project/sglang/issues/634
@zhyncs I'm looking forward it!! Thank you.
Motivation
By using multiple LoRA adapters, we can expect to achieve various behaviors within a single inference server. This can potentially reduce the number of servers needed to deploy inference servers, leading to cost savings. From a training perspective, since there is no need to fine-tune the entire model, we can iterate through experimental cycles more quickly.
Related resources
vllm