issues
search
laekov
/
fastmoe
A fast MoE impl for PyTorch
https://fastmoe.ai
Apache License 2.0
1.57k
stars
189
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Detailed documentation about model parallelism
#214
ZSL98
opened
1 month ago
0
smart Schedule中R操作没有和C操作重叠
#213
WhatBrain
opened
1 month ago
5
bash run_enwik8_base.sh train train --work_dir /dir/
#212
WYCAS
closed
2 months ago
0
how to run transformer-xl with parallel experts with single gpu?
#211
HudashiNeo
opened
2 months ago
6
Do We support DeepSpeed training? Thanks.
#210
lzl-mt
opened
2 months ago
1
前向传播返回值缺少bal_loss
#209
tisgotos
opened
2 months ago
2
您好,请问Megatron-LM的v2.2版本在哪里获取?
#208
tisgotos
closed
2 months ago
7
打开Smart schedule运行examples/transformer-xl/scripts/run_enwik8_base_moe.sh 报错
#207
WhatBrain
opened
2 months ago
6
No hiding output when using `pytest -s`
#206
roastduck
closed
5 months ago
0
Make the code neutral to device by removing `.cuda()`
#205
roastduck
closed
6 months ago
0
FasterMoE Shadow Policy: Detailed Inquiry
#204
Guodanding
closed
7 months ago
7
Update readme-cn.md
#203
HelloWorldLTY
closed
7 months ago
0
DDP error
#202
Peg-Wu
closed
7 months ago
0
CUDA memory increases after each loss.backward()
#201
sreetamasarkar
opened
8 months ago
6
Update switch_gate.py
#200
Heihaierr
closed
8 months ago
0
A bug in switch_gate
#199
Heihaierr
opened
8 months ago
6
About switch_gate
#198
Heihaierr
opened
8 months ago
1
multi-node problem
#197
Qianshaowei
opened
8 months ago
1
Example to run Megatron
#196
Juanhui28
opened
9 months ago
3
[BUG] AttributeError: module 'fmoe_cuda' has no attribute 'assign_pos_'
#195
pangsg
opened
9 months ago
3
跑FMOE的时候提示cudaErrorInvalidDevice
#194
pangsg
closed
9 months ago
6
fastmoe支持微调吗
#193
PowerDispatch
closed
9 months ago
0
fastmoe是否支持微调,page-attention,flasahattention和kvcache,混合精度等
#192
PowerDispatch
opened
9 months ago
4
请问fastmoe能被集成到VLLM里吗
#191
pangsg
opened
9 months ago
4
prep_text8.py没有该脚本
#190
PowerDispatch
closed
9 months ago
1
我们有线上沟通的群吗
#189
PowerDispatch
opened
9 months ago
1
你好,我想请问下在fastmoe中如何定义 dp+mp下的moe
#188
daixiangzi
closed
9 months ago
6
This PR resolves issue #186
#187
Cobalt-27
closed
10 months ago
0
num_experts argument error for Megatron-LM
#186
Cobalt-27
closed
10 months ago
0
[Feature] Make bias of gate optional for naive_gate and its subclasses.
#185
Zhang-RQ
closed
10 months ago
0
开启Smart schedule时报错Segmentation fault
#184
Xingzhi107
opened
11 months ago
8
pytest error
#183
R-QinQ
opened
11 months ago
3
setup.py error!
#182
R-QinQ
closed
11 months ago
4
ImportError: cannot import name 'get_args' from 'megatron'
#181
peter-fei
opened
11 months ago
5
During inference, the output of noisy gate is nan.
#180
zqhang
opened
12 months ago
5
Inconsistent evaluation result when clone expert parameters from original FFN
#179
Heihaierr
closed
1 year ago
1
MOELinear is much slower than torch.nn.Linear
#178
kamanphoebe
closed
1 year ago
7
ModuleNotFoundError: No module named 'fmoe_cuda'
#177
Taskii-Lei
opened
1 year ago
3
how to use balance loss?
#176
Heihaierr
opened
1 year ago
1
update clip-grad-v2.2.patch for grads_in_moe is empty
#175
Fragile-azalea
closed
1 year ago
0
Fix tests
#174
laekov
closed
1 year ago
0
Fit old code with new smgr
#173
laekov
closed
1 year ago
0
[BUG FIX] Fix bugs in stream manager.
#172
zms1999
closed
1 year ago
1
fix cublas gemm call for bf16 input
#171
xptree
closed
1 year ago
1
MOELinear always returns a zero tensor for bf16 input
#170
xptree
closed
1 year ago
1
MoE L2 norm reduce in Megatron
#169
blankde
closed
3 months ago
3
No overlapping observed when enabling Smart Scheduling
#168
chenyu-jiang
opened
1 year ago
8
Update outdated README
#167
zms1999
closed
1 year ago
0
Outdated doc for smart schedule with num_expert > 1?
#166
chenyu-jiang
closed
1 year ago
1
Document for process groups
#165
laekov
closed
1 year ago
0
Next