[Feature request] Add support for LoRA adapter weights

pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile

BSD 3-Clause "New" or "Revised" License

3.41k stars 226 forks source link

For task specific domain adaption support for LoRA weights is needed for a variety of use cases for LLM and Diffusion models:

On mobile where base foundation model will be preloaded on the device, provide option for each application to dynamically swap in/out LoRA weights corresponding to a task -- like text summarization, sentiment analysis, language translation, image generation based on artistic preference selected (eg animation images)...
For laptops/desktops AI Copilot scenario, with base foundation model preloaded, based on context of each application be able to perform different tasks using the LoRA adapter weights similar to mobile

Low latency for swapping of the adapter weights is a key factor for above use cases. Recompiling entire model again is not a practical option due to the latencies involved.

pytorch / torchchat

[Feature request] Add support for LoRA adapter weights #65