laekov / fastmoe

A fast MoE impl for PyTorch
https://fastmoe.ai
Apache License 2.0
1.57k stars 189 forks source link

A bug in switch_gate #199

Open Heihaierr opened 8 months ago

Heihaierr commented 8 months ago

Describe the bug In fmoe/gates/switch_gate.py line 45: capacity = math.ceil(cap_rate * inp.shape[0])

should be: capacity = math.ceil(cap_rate * inp.shape[0] / self.num_expert) ?

laekov commented 8 months ago

That is a good point. I think you are right. Can you please open a pull request on this? Thanks.

BTW, I am also wondering if the capacity calculation in GShardGate is wrong. @zms1999

Peg-Wu commented 7 months ago

Hi, guys!
Thanks for your fantastic work. I met a problem when I use class SwitchGate, can you take a look at it for me?

The following is my code:

import torch
from fmoe.gates import *

device = torch.device("cuda:0")

sg = SwitchGate(d_model=64, num_expert=5, world_size=2)
sg = sg.to(device)

input = torch.rand(128, 64) # (batch_size, d_model)
input = input.to(device)

idx, val = sg(input)
print(idx, idx.shape)
print(val, val.shape)

Parameter word_size can only set to 1, or it will occur the error "Segmentation fault (core dumped)".

laekov commented 7 months ago

@Peg-Wu As you are not using torch distributed, world_size has to be 1.

Peg-Wu commented 7 months ago

谢谢您的回复~

如果我想用DDP进行加速, 我应该怎样修改代码, 可以使用pytorch官方的DDP并行吗

laekov commented 7 months ago

@Peg-Wu Refer to this test

Peg-Wu commented 7 months ago

非常感谢!