pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.1k stars 193 forks source link

[Feature request] Add support for LoRA adapter weights #65

Open chauhang opened 5 months ago

chauhang commented 5 months ago

For task specific domain adaption support for LoRA weights is needed for a variety of use cases for LLM and Diffusion models:

  1. On mobile where base foundation model will be preloaded on the device, provide option for each application to dynamically swap in/out LoRA weights corresponding to a task -- like text summarization, sentiment analysis, language translation, image generation based on artistic preference selected (eg animation images)...
  2. For laptops/desktops AI Copilot scenario, with base foundation model preloaded, based on context of each application be able to perform different tasks using the LoRA adapter weights similar to mobile

Low latency for swapping of the adapter weights is a key factor for above use cases. Recompiling entire model again is not a practical option due to the latencies involved.

mikekgfb commented 5 months ago

Can somebody from the Biz Eng. team contribute this feature please? May need some integration work with how we handle weights, since we need partial weight updates. Ideally, this should be copy on write, so we can mmap the file, and then copy just pages that are being modified by updates.

We likely need a separate solution for mobile. cc: @malfet @angelayi @desertfire @JacobSzwejbka @iseeyuan