issues
search
microsoft
/
Tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
724
stars
93
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
How expert parameters are distributed in the cluster when using the Tutel framework?
#251
luuck
opened
1 day ago
1
solved
#250
hxtyy1201
closed
1 day ago
0
Resolve compatibility with tutel.checkpoint.* and SWIN-MoE ckpt
#249
ghostplant
closed
3 days ago
0
How to load 32-experts Swin-transformer-moe on a 2-GPU machine.
#248
ywxsuperstar
opened
4 days ago
8
Question: Dictionary of Optimal Parallelism & Pipelining
#247
hikettei
closed
2 months ago
2
How to convert checkpoint files that adapt to different distributed world sizes
#246
swjtulinxi
opened
2 months ago
1
fix llama_ffn forward function
#245
pingzhili
closed
2 months ago
1
Implementation of Llama FFN
#244
pingzhili
closed
2 months ago
2
Add custom data path to cifar10
#243
anirudhprabhakaran3
closed
2 months ago
0
fix scripts to support Tutel CPU on Mac OS X
#242
ghostplant
closed
3 months ago
0
Make it compatible with ROCm >= 6.0
#241
ghostplant
closed
3 months ago
0
Question regarding the load importance loss calculation
#240
wangyirui
opened
4 months ago
1
How about the cost of TUTEL features?
#239
fyang064
opened
4 months ago
1
fix(fast_dispatch): saving input tensor using ctx.save_for_backward
#238
KimmiShi
closed
4 months ago
1
Potential Memory Leak in GatingEncoder/Decoder of Fast_Dispatch
#237
KimmiShi
closed
4 months ago
1
How to use Megablocks in MoE training
#236
CSCYQJ
opened
4 months ago
1
add built-in llama_ffn; add helloworld_custom_expert_sharded;
#235
ghostplant
closed
5 months ago
1
update README.md for v0.3.2
#234
ghostplant
closed
5 months ago
0
Can tutel support Pipeline Parallel?
#233
xcwanAndy
closed
6 months ago
1
[Question] Comparison to FasterMoE
#232
Guodanding
opened
6 months ago
4
using TUTEL_GLOBAL_TIMEOUT_SEC to make NCCL timeout configurable
#231
ghostplant
closed
6 months ago
0
Qs
#230
zws98
opened
6 months ago
3
replace unnecessary zeros -> empty
#229
ghostplant
closed
6 months ago
0
enable message size larger than 4GB for all_to_all_v/all_gather_v
#228
ghostplant
closed
7 months ago
0
add tutel.examples.helloworld_demo based on custom experts
#227
ghostplant
closed
7 months ago
1
How to create a custom expert with tutel?
#226
zws98
opened
7 months ago
19
update online setup instructions
#225
ghostplant
closed
8 months ago
0
Add option to install for CPU only: export NO_CUDA=1
#224
ghostplant
closed
8 months ago
0
add device initialization for ops on non-default devices
#223
ghostplant
closed
9 months ago
0
add example files for NCCL all_to_all_v/all_gather_v
#222
ghostplant
closed
9 months ago
0
add primitives: net.batch_all_to_all_v(), net.batch_all_gather_v()
#221
ghostplant
closed
10 months ago
0
[Question] Why use datatype ncclInt8 in nccl_all_to_all_scatter_async.
#220
cicirori
opened
10 months ago
1
How to implement Fairseq-MoE training checkpoint like Swin-MoE?
#219
withinmiaov
opened
11 months ago
1
Non-surface function utilities only work for contiguous input data
#218
lyd126
opened
12 months ago
12
fill zeros with warning for params not defined in state_dict
#217
ghostplant
closed
1 year ago
0
Enable running without bias and update ffn instantiation
#216
vchiley
closed
1 year ago
4
RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values)) INTERNAL ASSERT FAILED
#215
jd730
closed
1 year ago
3
tutel is slower than the naive p2p using 2DH for small scale
#214
DongyuXu77
opened
1 year ago
3
What is the difference between this and deepspeed-moe?
#213
Hap-Zhang
closed
1 year ago
2
update tutel pipeline and setup deps
#212
ghostplant
closed
1 year ago
0
numpy not in requirements
#211
152334H
closed
1 year ago
5
updt init
#210
vchiley
opened
1 year ago
7
fix a few casts
#209
vchiley
closed
1 year ago
1
always use torch.distributed.run in new torch versions
#208
ghostplant
closed
1 year ago
0
how to use tutel on Megatron Deepspeed
#207
wangyuxin87
opened
1 year ago
4
Can this package support the one-gpu machine
#206
momo1986
opened
1 year ago
5
add more comment in helloworld_ddp example
#205
ghostplant
closed
1 year ago
0
Training with Data and Expert Parallelism
#204
santurini
opened
1 year ago
11
INTERNAL ASSERT FAILED
#203
Qicheng-WANG
opened
1 year ago
5
about compute_location and locations
#201
adverbial03
opened
1 year ago
1
Next