issues
search
microsoft
/
Tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
MIT License
737
stars
93
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
do destroy_process_group() upon exit to prevent warning alerts
#254
ghostplant
closed
6 days ago
0
check parent_group when creating subgroups
#253
ghostplant
closed
1 week ago
0
add create_standalone_group() in tutel.net
#252
ghostplant
closed
1 week ago
0
How expert parameters are distributed in the cluster when using the Tutel framework?
#251
luuck
opened
4 weeks ago
1
solved
#250
hxtyy1201
closed
4 weeks ago
0
Resolve compatibility with tutel.checkpoint.* and SWIN-MoE ckpt
#249
ghostplant
closed
1 month ago
0
How to load 32-experts Swin-transformer-moe on a 2-GPU machine.
#248
ywxsuperstar
opened
1 month ago
8
Question: Dictionary of Optimal Parallelism & Pipelining
#247
hikettei
closed
3 months ago
2
How to convert checkpoint files that adapt to different distributed world sizes
#246
swjtulinxi
opened
3 months ago
1
fix llama_ffn forward function
#245
pingzhili
closed
3 months ago
1
Implementation of Llama FFN
#244
pingzhili
closed
3 months ago
2
Add custom data path to cifar10
#243
anirudhprabhakaran3
closed
3 months ago
0
fix scripts to support Tutel CPU on Mac OS X
#242
ghostplant
closed
4 months ago
0
Make it compatible with ROCm >= 6.0
#241
ghostplant
closed
4 months ago
0
Question regarding the load importance loss calculation
#240
wangyirui
opened
5 months ago
1
How about the cost of TUTEL features?
#239
fyang064
opened
5 months ago
1
fix(fast_dispatch): saving input tensor using ctx.save_for_backward
#238
KimmiShi
closed
5 months ago
1
Potential Memory Leak in GatingEncoder/Decoder of Fast_Dispatch
#237
KimmiShi
closed
5 months ago
1
How to use Megablocks in MoE training
#236
CSCYQJ
opened
5 months ago
1
add built-in llama_ffn; add helloworld_custom_expert_sharded;
#235
ghostplant
closed
6 months ago
1
update README.md for v0.3.2
#234
ghostplant
closed
6 months ago
0
Can tutel support Pipeline Parallel?
#233
xcwanAndy
closed
7 months ago
1
[Question] Comparison to FasterMoE
#232
Guodanding
opened
7 months ago
4
using TUTEL_GLOBAL_TIMEOUT_SEC to make NCCL timeout configurable
#231
ghostplant
closed
7 months ago
0
Qs
#230
zws98
opened
7 months ago
3
replace unnecessary zeros -> empty
#229
ghostplant
closed
7 months ago
0
enable message size larger than 4GB for all_to_all_v/all_gather_v
#228
ghostplant
closed
8 months ago
0
add tutel.examples.helloworld_demo based on custom experts
#227
ghostplant
closed
8 months ago
1
How to create a custom expert with tutel?
#226
zws98
opened
8 months ago
19
update online setup instructions
#225
ghostplant
closed
9 months ago
0
Add option to install for CPU only: export NO_CUDA=1
#224
ghostplant
closed
9 months ago
0
add device initialization for ops on non-default devices
#223
ghostplant
closed
10 months ago
0
add example files for NCCL all_to_all_v/all_gather_v
#222
ghostplant
closed
10 months ago
0
add primitives: net.batch_all_to_all_v(), net.batch_all_gather_v()
#221
ghostplant
closed
11 months ago
0
[Question] Why use datatype ncclInt8 in nccl_all_to_all_scatter_async.
#220
cicirori
opened
11 months ago
1
How to implement Fairseq-MoE training checkpoint like Swin-MoE?
#219
withinmiaov
opened
1 year ago
1
Non-surface function utilities only work for contiguous input data
#218
lyd126
opened
1 year ago
12
fill zeros with warning for params not defined in state_dict
#217
ghostplant
closed
1 year ago
0
Enable running without bias and update ffn instantiation
#216
vchiley
closed
1 year ago
4
RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values)) INTERNAL ASSERT FAILED
#215
jd730
closed
1 year ago
3
tutel is slower than the naive p2p using 2DH for small scale
#214
DongyuXu77
opened
1 year ago
3
What is the difference between this and deepspeed-moe?
#213
Hap-Zhang
closed
1 year ago
2
update tutel pipeline and setup deps
#212
ghostplant
closed
1 year ago
0
numpy not in requirements
#211
152334H
closed
1 year ago
5
updt init
#210
vchiley
opened
1 year ago
7
fix a few casts
#209
vchiley
closed
1 year ago
1
always use torch.distributed.run in new torch versions
#208
ghostplant
closed
1 year ago
0
how to use tutel on Megatron Deepspeed
#207
wangyuxin87
opened
1 year ago
4
Can this package support the one-gpu machine
#206
momo1986
opened
1 year ago
5
add more comment in helloworld_ddp example
#205
ghostplant
closed
1 year ago
0
Next