Closed Aman-Goel1 closed 11 months ago
@lucidrains Thanks for the swift response. I thought this was the implementation of 2017 Shazeer et al.'s MoE but it's GShard's MoE. That was a confusion on my part.
Also thanks for the ST-MoE repository, I'll be most probably using that instead!
Hi lucidrains, thanks for the amzing repository. I was wondering where the load balancing loss was? I recall there being two losses, auxillary loss as well as load balancing loss in the 2017 mixture-of-experts paper.