laekov / fastmoe

A fast MoE impl for PyTorch
https://fastmoe.ai
Apache License 2.0
1.57k stars 189 forks source link

pytest error #183

Open R-QinQ opened 11 months ago

R-QinQ commented 11 months ago

I find out the moe is 0, but i don't know why image

laekov commented 11 months ago

Which test is this error produced by?

R-QinQ commented 11 months ago

这个错误是由哪个测试产生的?

Produced by testing the test_fmoe_linear_distributed() function in the test_ddp.py and all of the test parameters is error image

laekov commented 11 months ago

I am not able to reproduce this issue. Maybe you need to verify that the nccl version of your pytorch matches the nccl version that you use to compile FastMoE. You can get PyTorch's NCCL version by print(torch.cuda.nccl.version()) .