intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System
Other
1.27k stars 167 forks source link

Add balance loss in atorch moe example #1300

Open skydoorkai opened 1 month ago

skydoorkai commented 1 month ago

Add balance loss in moe example.

We add use aux loss to router in moe example, similar to aux loss implementation in megatron moe.