[Feature] Support for Inference with LoRA Adapter

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

https://sglang.readthedocs.io/en/latest/

Apache License 2.0

5.44k stars 403 forks source link

[Feature] Support for Inference with LoRA Adapter #847

Closed kamillle closed 2 months ago

kamillle commented 2 months ago

Motivation

By using multiple LoRA adapters, we can expect to achieve various behaviors within a single inference server. This can potentially reduce the number of servers needed to deploy inference servers, leading to cost savings. From a training perspective, since there is no need to fine-tune the entire model, we can iterate through experimental cycles more quickly.

Related resources

vllm

zhyncs commented 2 months ago

Hi @kamillle Thank you for your attention and valuable suggestions. Support for LoRA is in our roadmap, please stay tuned. https://github.com/sgl-project/sglang/issues/634

kamillle commented 2 months ago

@zhyncs I'm looking forward it!! Thank you.