Open joaopcm1996 opened 8 months ago
Can you add a line in your script to download the repo to a local path and run from there?
For instance, you can add lines like the following before running vLLM inference.
from huggingface_hub import snapshot_download
lora_path = snapshot_download(repo_id="yard1/llama-2-7b-sql-lora-test")
Yes, I did this here by downloading all the adapters to disk before launching vLLM. However, the fact that all adapter ids and corresponding local paths need to be defined statically at launch means no new adapters can be loaded without relaunching the server. It also means the number of adapters that can be served is limited by the server's disk space, as there is no eviction from disk I am aware of at this point. This can be improved so that the same endpoint can stay provisioned, and new adapters can be loaded from remote object storage dynamically.
I definitely agree with this idea. I am considering using https://github.com/predibase/lorax for this reason, but other than this feature I highly prefer vLLM.
I would also prefer vLLM instead of Lorax that looks a lot like TGI. Ideally there could be a caching parameter for vllm that downloads adapter and deletes after x amount of time if it hasn't been used. Of course, this needs some state management.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
🚀 The feature, motivation and pitch
Request for dynamic download of LoRA adapters from S3 or HF Hub based on what
model
adapter id is passed in the request.Alternatives
No alternatives as of today, adapters need to be downloaded to server upfront and locally available.
Additional context
No response