microsoft Tutel issues - Githubissues

microsoft / Tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

MIT License

737 stars 93 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

do destroy_process_group() upon exit to prevent warning alerts

#254 ghostplant closed 6 days ago
0
check parent_group when creating subgroups

#253 ghostplant closed 1 week ago
0
add create_standalone_group() in tutel.net

#252 ghostplant closed 1 week ago
0
How expert parameters are distributed in the cluster when using the Tutel framework?

#251 luuck opened 4 weeks ago
1
solved

#250 hxtyy1201 closed 4 weeks ago
0
Resolve compatibility with tutel.checkpoint.* and SWIN-MoE ckpt

#249 ghostplant closed 1 month ago
0
How to load 32-experts Swin-transformer-moe on a 2-GPU machine.

#248 ywxsuperstar opened 1 month ago
8
Question: Dictionary of Optimal Parallelism & Pipelining

#247 hikettei closed 3 months ago
2
How to convert checkpoint files that adapt to different distributed world sizes

#246 swjtulinxi opened 3 months ago
1
fix llama_ffn forward function

#245 pingzhili closed 3 months ago
1
Implementation of Llama FFN

#244 pingzhili closed 3 months ago
2
Add custom data path to cifar10

#243 anirudhprabhakaran3 closed 3 months ago
0
fix scripts to support Tutel CPU on Mac OS X

#242 ghostplant closed 4 months ago
0
Make it compatible with ROCm >= 6.0

#241 ghostplant closed 4 months ago
0
Question regarding the load importance loss calculation

#240 wangyirui opened 5 months ago
1
How about the cost of TUTEL features?

#239 fyang064 opened 5 months ago
1
fix(fast_dispatch): saving input tensor using ctx.save_for_backward

#238 KimmiShi closed 5 months ago
1
Potential Memory Leak in GatingEncoder/Decoder of Fast_Dispatch

#237 KimmiShi closed 5 months ago
1
How to use Megablocks in MoE training

#236 CSCYQJ opened 5 months ago
1
add built-in llama_ffn; add helloworld_custom_expert_sharded;

#235 ghostplant closed 6 months ago
1
update README.md for v0.3.2

#234 ghostplant closed 6 months ago
0
Can tutel support Pipeline Parallel?

#233 xcwanAndy closed 7 months ago
1
[Question] Comparison to FasterMoE

#232 Guodanding opened 7 months ago
4
using TUTEL_GLOBAL_TIMEOUT_SEC to make NCCL timeout configurable

#231 ghostplant closed 7 months ago
0
Qs

#230 zws98 opened 7 months ago
3
replace unnecessary zeros -> empty

#229 ghostplant closed 7 months ago
0
enable message size larger than 4GB for all_to_all_v/all_gather_v

#228 ghostplant closed 8 months ago
0
add tutel.examples.helloworld_demo based on custom experts

#227 ghostplant closed 8 months ago
1
How to create a custom expert with tutel?

#226 zws98 opened 8 months ago
19
update online setup instructions

#225 ghostplant closed 9 months ago
0
Add option to install for CPU only: export NO_CUDA=1

#224 ghostplant closed 9 months ago
0
add device initialization for ops on non-default devices

#223 ghostplant closed 10 months ago
0
add example files for NCCL all_to_all_v/all_gather_v

#222 ghostplant closed 10 months ago
0
add primitives: net.batch_all_to_all_v(), net.batch_all_gather_v()

#221 ghostplant closed 11 months ago
0
[Question] Why use datatype ncclInt8 in nccl_all_to_all_scatter_async.

#220 cicirori opened 11 months ago
1
How to implement Fairseq-MoE training checkpoint like Swin-MoE?

#219 withinmiaov opened 1 year ago
1
Non-surface function utilities only work for contiguous input data

#218 lyd126 opened 1 year ago
12
fill zeros with warning for params not defined in state_dict

#217 ghostplant closed 1 year ago
0
Enable running without bias and update ffn instantiation

#216 vchiley closed 1 year ago
4
RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values)) INTERNAL ASSERT FAILED

#215 jd730 closed 1 year ago
3
tutel is slower than the naive p2p using 2DH for small scale

#214 DongyuXu77 opened 1 year ago
3
What is the difference between this and deepspeed-moe?

#213 Hap-Zhang closed 1 year ago
2
update tutel pipeline and setup deps

#212 ghostplant closed 1 year ago
0
numpy not in requirements

#211 152334H closed 1 year ago
5
updt init

#210 vchiley opened 1 year ago
7
fix a few casts

#209 vchiley closed 1 year ago
1
always use torch.distributed.run in new torch versions

#208 ghostplant closed 1 year ago
0
how to use tutel on Megatron Deepspeed

#207 wangyuxin87 opened 1 year ago
4
Can this package support the one-gpu machine

#206 momo1986 opened 1 year ago
5
add more comment in helloworld_ddp example

#205 ghostplant closed 1 year ago
0