vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.66k stars 4.65k forks source link

[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server #10546

Open angkywilliam opened 22 hours ago

angkywilliam commented 22 hours ago

In the current implementation, vLLM only supports loading LoRA from local storage.

At OpenPipe, we are extending the serving engine's capabilities by introducing a LoRAResolver. LoRAResolver enables vLLM users to implement custom logic for fetching LoRA files from remote servers.

For example, in OpenPipe's case, we are dynamically loading LoRA for our customer from S3.

github-actions[bot] commented 22 hours ago

👋 Hi! Thank you for contributing to the vLLM project. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

🚀

angkywilliam commented 18 hours ago

@simon-mo @khluu can I get help to unblock buildkite, thanks!

khluu commented 18 hours ago

@simon-mo @khluu can I get help to unblock buildkite, thanks!

which test do you need to unblock? of if you want permission to unblock tests, can I have your email associated with your github?

angkywilliam commented 17 hours ago

@simon-mo @khluu can I get help to unblock buildkite, thanks!

which test do you need to unblock? of if you want permission to unblock tests, can I have your email associated with your github?

Hey @khluu , this is my first time opening PR to vLLM.

Do I need to pass all the test in BuildKite for the code to be reviewed and merged? If yes, I would need help to unblock all the test in BuildKite.

Screenshot 2024-11-21 at 5 00 25 PM

I also send you email with subject Unblock Dynamically Load LoRA from a Remote Server for my email address.

Thanks for the help!

jeejeelee commented 16 hours ago

IIUC, although this PR is related to LoRA loading, it seems you haven't touched the underlying LORA logic. What you might need is to add unit tests similar to #6566. BTW, maybe you can refer to : https://github.com/vllm-project/vllm/blob/main/vllm/lora/worker_manager.py#L94

khluu commented 15 hours ago

@simon-mo @khluu can I get help to unblock buildkite, thanks!

which test do you need to unblock? of if you want permission to unblock tests, can I have your email associated with your github?

Hey @khluu , this is my first time opening PR to vLLM.

Do I need to pass all the test in BuildKite for the code to be reviewed and merged? If yes, I would need help to unblock all the test in BuildKite.

Screenshot 2024-11-21 at 5 00 25 PM

I also send you email with subject Unblock Dynamically Load LoRA from a Remote Server for my email address.

Thanks for the help!

Oh if it's just for merging, you can just wait until your PR is reviewed and approved, then your PR reviewer can run CI for you. Fastcheck is more like a subset of CI, running short tests and give you the flexibility to run extra specific test if needed (which you need to be in Buildkite org to do).