thu-pacman / SmartMoE-AE

ATC23 AE
GNU General Public License v3.0
40 stars 4 forks source link

The implementation of `update_expert_mapping` does not work for optimizer states of experts #3

Open qyhfrank opened 2 months ago

qyhfrank commented 2 months ago

Thank you for your work! I've recently been reading the source code of SmartMoE and noticed that there is no implementation for transferring the optimizer state of experts in the update_expert_mapping function in layer.py. Could this potentially cause issues with gradient updates?

zms1999 commented 2 months ago

I have noticed this problem. I am working on it.