lucidrains / mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
MIT License
628 stars 49 forks source link

Would you elaborate more on the enhancement? #9

Open yhyu13 opened 1 year ago

yhyu13 commented 1 year ago

It will mostly be a line-by-line transcription of the tensorflow implementation here, with a few enhancements.

Would you elaborate more on the a few enhancements on top of the tensorflow implementation when re-implemented in pytorch?

Thanks!