laekov / fastmoe

A fast MoE impl for PyTorch
https://fastmoe.ai
Apache License 2.0
1.56k stars 188 forks source link

pytest error #183

Open R-QinQ opened 10 months ago

R-QinQ commented 10 months ago

I find out the moe is 0, but i don't know why image

laekov commented 10 months ago

Which test is this error produced by?

R-QinQ commented 10 months ago

这个错误是由哪个测试产生的?

Produced by testing the test_fmoe_linear_distributed() function in the test_ddp.py and all of the test parameters is error image

laekov commented 10 months ago

I am not able to reproduce this issue. Maybe you need to verify that the nccl version of your pytorch matches the nccl version that you use to compile FastMoE. You can get PyTorch's NCCL version by print(torch.cuda.nccl.version()) .