laekov fastmoe issues - Githubissues

laekov / fastmoe

A fast MoE impl for PyTorch

https://fastmoe.ai

Apache License 2.0

1.57k stars 189 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Detailed documentation about model parallelism

#214 ZSL98 opened 1 month ago
0
smart Schedule中R操作没有和C操作重叠

#213 WhatBrain opened 1 month ago
5
bash run_enwik8_base.sh train train --work_dir /dir/

#212 WYCAS closed 2 months ago
0
how to run transformer-xl with parallel experts with single gpu?

#211 HudashiNeo opened 2 months ago
6
Do We support DeepSpeed training? Thanks.

#210 lzl-mt opened 2 months ago
1
前向传播返回值缺少bal_loss

#209 tisgotos opened 2 months ago
2
您好，请问Megatron-LM的v2.2版本在哪里获取？

#208 tisgotos closed 2 months ago
7
打开Smart schedule运行examples/transformer-xl/scripts/run_enwik8_base_moe.sh 报错

#207 WhatBrain opened 2 months ago
6
No hiding output when using `pytest -s`

#206 roastduck closed 5 months ago
0
Make the code neutral to device by removing `.cuda()`

#205 roastduck closed 6 months ago
0
FasterMoE Shadow Policy: Detailed Inquiry

#204 Guodanding closed 7 months ago
7
Update readme-cn.md

#203 HelloWorldLTY closed 7 months ago
0
DDP error

#202 Peg-Wu closed 7 months ago
0
CUDA memory increases after each loss.backward()

#201 sreetamasarkar opened 8 months ago
6
Update switch_gate.py

#200 Heihaierr closed 8 months ago
0
A bug in switch_gate

#199 Heihaierr opened 8 months ago
6
About switch_gate

#198 Heihaierr opened 8 months ago
1
multi-node problem

#197 Qianshaowei opened 8 months ago
1
Example to run Megatron

#196 Juanhui28 opened 9 months ago
3
[BUG] AttributeError: module 'fmoe_cuda' has no attribute 'assign_pos_'

#195 pangsg opened 9 months ago
3
跑FMOE的时候提示cudaErrorInvalidDevice

#194 pangsg closed 9 months ago
6
fastmoe支持微调吗

#193 PowerDispatch closed 9 months ago
0
fastmoe是否支持微调，page-attention，flasahattention和kvcache，混合精度等

#192 PowerDispatch opened 9 months ago
4
请问fastmoe能被集成到VLLM里吗

#191 pangsg opened 9 months ago
4
prep_text8.py没有该脚本

#190 PowerDispatch closed 9 months ago
1
我们有线上沟通的群吗

#189 PowerDispatch opened 9 months ago
1
你好，我想请问下在fastmoe中如何定义 dp+mp下的moe

#188 daixiangzi closed 9 months ago
6
This PR resolves issue #186

#187 Cobalt-27 closed 10 months ago
0
num_experts argument error for Megatron-LM

#186 Cobalt-27 closed 10 months ago
0
[Feature] Make bias of gate optional for naive_gate and its subclasses.

#185 Zhang-RQ closed 10 months ago
0
开启Smart schedule时报错Segmentation fault

#184 Xingzhi107 opened 11 months ago
8
pytest error

#183 R-QinQ opened 11 months ago
3
setup.py error！

#182 R-QinQ closed 11 months ago
4
ImportError: cannot import name 'get_args' from 'megatron'

#181 peter-fei opened 11 months ago
5
During inference, the output of noisy gate is nan.

#180 zqhang opened 12 months ago
5
Inconsistent evaluation result when clone expert parameters from original FFN

#179 Heihaierr closed 1 year ago
1
MOELinear is much slower than torch.nn.Linear

#178 kamanphoebe closed 1 year ago
7
ModuleNotFoundError: No module named 'fmoe_cuda'

#177 Taskii-Lei opened 1 year ago
3
how to use balance loss?

#176 Heihaierr opened 1 year ago
1
update clip-grad-v2.2.patch for grads_in_moe is empty

#175 Fragile-azalea closed 1 year ago
0
Fix tests

#174 laekov closed 1 year ago
0
Fit old code with new smgr

#173 laekov closed 1 year ago
0
[BUG FIX] Fix bugs in stream manager.

#172 zms1999 closed 1 year ago
1
fix cublas gemm call for bf16 input

#171 xptree closed 1 year ago
1
MOELinear always returns a zero tensor for bf16 input

#170 xptree closed 1 year ago
1
MoE L2 norm reduce in Megatron

#169 blankde closed 3 months ago
3
No overlapping observed when enabling Smart Scheduling

#168 chenyu-jiang opened 1 year ago
8
Update outdated README

#167 zms1999 closed 1 year ago
0
Outdated doc for smart schedule with num_expert > 1?

#166 chenyu-jiang closed 1 year ago
1
Document for process groups

#165 laekov closed 1 year ago
0