lucidrains / mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
MIT License
628 stars 49 forks source link

RuntimeError: expected backend CPU and dtype Float but got backend CPU and dtype Long #2

Closed littlepan0413 closed 3 years ago

littlepan0413 commented 3 years ago

code

import torch from mixture_of_experts import HeirarchicalMoE

moe = HeirarchicalMoE( dim = 512, num_experts = (4, 4), # 4 gates on the first layer, then 4 experts on the second, equaling 16 experts )

inputs = torch.randn(4, 1024, 512) out, aux_loss = moe(inputs) # (4, 1024, 512), (1,)

print

Traceback (most recent call last): File "/home/bi/panlu/ComplexQG-MOE/test/test3.py", line 20, in out, aux_loss = moe(inputs) # (4, 1024, 512), (1,) File "/home/bi/software/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, *kwargs) File "/home/bi/software/anaconda/lib/python3.6/site-packages/mixture_of_experts/mixture_of_experts.py", line 254, in forward dispatch_tensor, combine_tensor, loss = self.gate(inputs) File "/home/bi/software/anaconda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, **kwargs) File "/home/bi/software/anaconda/lib/python3.6/site-packages/mixture_of_experts/mixture_of_experts.py", line 217, in forward

lucidrains commented 3 years ago

@littlepan0413 it works for me, what version of pytorch are you using?