Open chauhang opened 8 months ago
Can somebody from the Biz Eng. team contribute this feature please? May need some integration work with how we handle weights, since we need partial weight updates. Ideally, this should be copy on write, so we can mmap the file, and then copy just pages that are being modified by updates.
We likely need a separate solution for mobile. cc: @malfet @angelayi @desertfire @JacobSzwejbka @iseeyuan
For task specific domain adaption support for LoRA weights is needed for a variety of use cases for LLM and Diffusion models:
Low latency for swapping of the adapter weights is a key factor for above use cases. Recompiling entire model again is not a practical option due to the latencies involved.