lucidrains mixture-of-experts issues - Githubissues

lucidrains / mixture-of-experts

A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models

MIT License

628 stars 49 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

PEER implementation

#11 huu4ontocord closed 3 months ago
1
Load balancing loss?

#10 Aman-Goel1 closed 11 months ago
2
Would you elaborate more on the enhancement?

#9 yhyu13 opened 1 year ago
0
convolution operation

#8 Yonsun-w opened 2 years ago
0
Regarding experts.w1 and experts.w2 gradients

#7 MukundVarmaT opened 2 years ago
1
implicit inplace operation '*=' cause an error when deriving the back gradient in pytorch

#6 VRCMF closed 2 years ago
1
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

#5 mxs30443 opened 3 years ago
1
question about why need an exclusive cumsum in gating method?

#4 kugwzk closed 3 years ago
0
Error reported under FP16 training

#3 SefaZeng closed 3 years ago
1
RuntimeError: expected backend CPU and dtype Float but got backend CPU and dtype Long

#2 littlepan0413 closed 3 years ago
1
Segmentation Fault?

#1 SungMinCho closed 4 years ago
1