vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.96k stars 4.71k forks source link

How to call/add a new lora module to a live server? #2916

Open jayteaftw opened 9 months ago

jayteaftw commented 9 months ago

Hi,

I was reading through the documentation for Using Lora in VLLM.

In the documentation when they start the server, it looks like they have to specify which Lora modules are available --lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/

Is it possible to do this in real-time instead? That is, start the server and call a recently added Lora module without having to stop and restart the server?

prd-tuong-nguyen commented 9 months ago

@jayteaftw Agree with you

Erikfcb commented 8 months ago

Might be super helpful

AlphaINF commented 8 months ago

mark, I am adding this feature.

AlphaINF commented 8 months ago

https://github.com/vllm-project/vllm/pull/3446

Liucd0520 commented 2 months ago

hello,if this problem is fixed now. Indeed, when I need to update the lora model, I must stop the server, and add the new lora model and start the server again. what a stupid operation ! if lora model can be added to online server, the problem can be solved.