Open junhwi opened 6 months ago
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling https://arxiv.org/abs/2312.15166
Fast Inference of Mixture-of-Experts Language Models with Offloading https://paperswithcode.com/paper/fast-inference-of-mixture-of-experts-language
https://openai.com/research/weak-to-strong-generalization
SSM https://youtu.be/dKJEpOtVgXc
SOLAR https://arxiv.org/abs/2312.15166 https://chat.lmsys.org/ (refer to leaderboard) https://twitter.com/LChoshen/status/1739993589969564027
Model Merge https://github.com/cg123/mergekit https://arxiv.org/abs/2306.01708
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling https://arxiv.org/abs/2312.15166
Fast Inference of Mixture-of-Experts Language Models with Offloading https://paperswithcode.com/paper/fast-inference-of-mixture-of-experts-language
https://openai.com/research/weak-to-strong-generalization