Would you elaborate more on the enhancement?

lucidrains / mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

MIT License

628 stars 49 forks source link

Open yhyu13 opened 1 year ago

yhyu13 commented 1 year ago

It will mostly be a line-by-line transcription of the tensorflow implementation here, with a few enhancements.

Would you elaborate more on the a few enhancements on top of the tensorflow implementation when re-implemented in pytorch?

Thanks!