Discussing LoraHub: Exploration, Implementation, and Potential Improvements

sail-sg / lorahub

[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

MIT License

583 stars 35 forks source link

LoraHub is a really great idea, similar to a few ideas I thought of yesterday.

Unlike MOE, instead of training many domain experts, it trains multiple Loras on a large base model.
During inference, a router mechanism is used to select which Lora weights to combine for inference. Only one base model is needed for deployment. Like a chain of trees, if you infer several times, you can achieve better performance.
The training parameters and data for Lora can be made more aggressive, ready to scale up. For example, a 65B base model, trained on high-quality data from 8 different domains, separately trains 8 1B Loras. Has anyone compared whether its performance is better or worse than MOE?
It is not yet very clear which base models were chosen in the paper, how the training parameters were, how the Loras were merged for inference, and many other details. I am waiting for the code to be published for more details.
How to cleverly design the router mechanism is also worth researching and discussing. Are there any related materials to recommend?

sail-sg / lorahub