Open R-QinQ opened 10 months ago
Which test is this error produced by?
这个错误是由哪个测试产生的?
Produced by testing the test_fmoe_linear_distributed() function in the test_ddp.py and all of the test parameters is error
I am not able to reproduce this issue. Maybe you need to verify that the nccl version of your pytorch matches the nccl version that you use to compile FastMoE. You can get PyTorch's NCCL version by print(torch.cuda.nccl.version())
.
I find out the moe is 0, but i don't know why