vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.05k stars 4.54k forks source link

[Bug]: single lora request error make all processing requests error #4879

Open jinzhen-lin opened 5 months ago

jinzhen-lin commented 5 months ago

Your current environment

The output of `python collect_env.py`

🐛 Describe the bug

Vllm load lora checkpoints when executing model

https://github.com/vllm-project/vllm/blob/v0.4.2/vllm/worker/model_runner.py#L789-L790

https://github.com/vllm-project/vllm/blob/v0.4.2/vllm/lora/worker_manager.py#L138-L172

Then when we get an error when loading lora checkpoint (e.g. lora rank > max_lora_rank), all processing requests would fail (no matter whether other requests use lora).

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!