smart Schedule中R操作没有和C操作重叠

WhatBrain commented 3 weeks ago

Describe the bug 我使用megatron-LM V2.5 patch ,执行命令为 FMOE_FASTER_SHADOW_ENABLE=1 FMOE_FASTER_SCHEDULE_ENABLE=1 FMOE_FASTER_GROUP_SIZE=4 bash pretrain_gpt_distributed.sh 用单机8卡跑gpt2+moe，设置了一共16个expert，在profiler中可以看到每个卡有2个expert，分成2组，每个expert跑2次但4个R操作是在所有expert的C操作执行完后才一起进行：这是怎么回事，非常感谢能回答这个问题的人

Logs If applicable, add logs to help explain your problem.

Platform

Device: [e.g. NVIDIA A100]
CUDA version: [12.1]
NCCL version: [2.18.1]
PyTorch version: [2.1.0]

laekov commented 3 weeks ago

@zms1999 之前是否观察到过同样的问题?

WhatBrain commented 3 weeks ago

how to use @SagarChandra07

laekov commented 3 weeks ago

how to use @SagarChandra07

请谨慎操作. 此贴疑似钓鱼.

thelabcat commented 3 weeks ago

how to use @SagarChandra07

Do not run that file! The account has been spamming links to malware all over GitHub. In at least some cases they use a password protected zip archive to evade the auto-check by MediaFire.

WhatBrain commented 3 weeks ago

how to use @SagarChandra07

Do not run that file! The account has been spamming links to malware all over GitHub. In at least some cases they use a password protected zip archive to evade the auto-check by MediaFire.

Ok.Thank you very much

laekov / fastmoe

smart Schedule中R操作没有和C操作重叠 #213