Closed nuzant closed 3 months ago
Changes:
Add MoE model support (Mixtral), referece: Megatron-LM MoE implementation.
Add a global stats tracker to log stats that cannot be directly obtained in interface implementation (e.g. aux loss in MoE router).
Update README.md to acknowledge open-source projects references.
Changes:
Add MoE model support (Mixtral), referece: Megatron-LM MoE implementation.
Add a global stats tracker to log stats that cannot be directly obtained in interface implementation (e.g. aux loss in MoE router).
Update README.md to acknowledge open-source projects references.