Closed QAQdev closed 3 weeks ago
We propose Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models, a routing mechanism which allows for a variable number of experts per token as well as a procedure for dynamically changing the number of experts during training.
I think this may match your survey's focus! Please kindly consider including this paper.
We also provide our implementation at LINs-lab/DynMoE.
Thanks for providing the information. We will check it out and try to include it in our revision.
Thanks a lot!
We propose Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models, a routing mechanism which allows for a variable number of experts per token as well as a procedure for dynamically changing the number of experts during training.
I think this may match your survey's focus! Please kindly consider including this paper.
We also provide our implementation at LINs-lab/DynMoE.