issues
search
lucidrains
/
mixture-of-experts
A Pytorch implementation of Sparsely-Gated Mixture of Experts, for massively increasing the parameter count of language models
MIT License
628
stars
49
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
PEER implementation
#11
huu4ontocord
closed
3 months ago
1
Load balancing loss?
#10
Aman-Goel1
closed
11 months ago
2
Would you elaborate more on the enhancement?
#9
yhyu13
opened
1 year ago
0
convolution operation
#8
Yonsun-w
opened
2 years ago
0
Regarding experts.w1 and experts.w2 gradients
#7
MukundVarmaT
opened
2 years ago
1
implicit inplace operation '*=' cause an error when deriving the back gradient in pytorch
#6
VRCMF
closed
2 years ago
1
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
#5
mxs30443
opened
3 years ago
1
question about why need an exclusive cumsum in gating method?
#4
kugwzk
closed
3 years ago
0
Error reported under FP16 training
#3
SefaZeng
closed
3 years ago
1
RuntimeError: expected backend CPU and dtype Float but got backend CPU and dtype Long
#2
littlepan0413
closed
3 years ago
1
Segmentation Fault?
#1
SungMinCho
closed
4 years ago
1