microsoft tutel issues - Githubissues

microsoft / tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation

MIT License

685 stars 85 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Question regarding the load importance loss calculation

#240 wangyirui opened 4 weeks ago
1
How about the cost of TUTEL features?

#239 fyang064 opened 1 month ago
1
fix(fast_dispatch): saving input tensor using ctx.save_for_backward

#238 KimmiShi closed 1 month ago
1
Potential Memory Leak in GatingEncoder/Decoder of Fast_Dispatch

#237 KimmiShi closed 1 month ago
1
How to use Megablocks in MoE training

#236 CSCYQJ opened 1 month ago
1
add built-in llama_ffn; add helloworld_custom_expert_sharded;

#235 ghostplant closed 2 months ago
1
update README.md for v0.3.2

#234 ghostplant closed 2 months ago
0
Can tutel support Pipeline Parallel?

#233 xcwanAndy closed 2 months ago
1
[Question] Comparison to FasterMoE

#232 Guodanding opened 2 months ago
4
using TUTEL_GLOBAL_TIMEOUT_SEC to make NCCL timeout configurable

#231 ghostplant closed 2 months ago
0
Qs

#230 zws98 opened 2 months ago
3
replace unnecessary zeros -> empty

#229 ghostplant closed 3 months ago
0
enable message size larger than 4GB for all_to_all_v/all_gather_v

#228 ghostplant closed 3 months ago
0
add tutel.examples.helloworld_demo based on custom experts

#227 ghostplant closed 3 months ago
1
How to create a custom expert with tutel?

#226 zws98 opened 3 months ago
19
update online setup instructions

#225 ghostplant closed 4 months ago
0
Add option to install for CPU only: export NO_CUDA=1

#224 ghostplant closed 5 months ago
0
add device initialization for ops on non-default devices

#223 ghostplant closed 6 months ago
0
add example files for NCCL all_to_all_v/all_gather_v

#222 ghostplant closed 6 months ago
0
add primitives: net.batch_all_to_all_v(), net.batch_all_gather_v()

#221 ghostplant closed 6 months ago
0
[Question] Why use datatype ncclInt8 in nccl_all_to_all_scatter_async.

#220 cicirori opened 6 months ago
1
How to implement Fairseq-MoE training checkpoint like Swin-MoE?

#219 withinmiaov opened 8 months ago
1
Non-surface function utilities only work for contiguous input data

#218 lyd126 opened 8 months ago
12
fill zeros with warning for params not defined in state_dict

#217 ghostplant closed 9 months ago
0
Enable running without bias and update ffn instantiation

#216 vchiley closed 9 months ago
4
RuntimeError: (0) == (cuModuleLoadDataEx(&hMod, image.c_str(), sizeof(options) / sizeof(*options), options, values)) INTERNAL ASSERT FAILED

#215 jd730 closed 10 months ago
3
tutel is slower than the naive p2p using 2DH for small scale

#214 DongyuXu77 opened 10 months ago
3
What is the difference between this and deepspeed-moe?

#213 Hap-Zhang closed 10 months ago
2
update tutel pipeline and setup deps

#212 ghostplant closed 11 months ago
0
numpy not in requirements

#211 152334H closed 11 months ago
5
updt init

#210 vchiley opened 11 months ago
7
fix a few casts

#209 vchiley closed 11 months ago
1
always use torch.distributed.run in new torch versions

#208 ghostplant closed 11 months ago
0
how to use tutel on Megatron Deepspeed

#207 wangyuxin87 opened 1 year ago
4
Can this package support the one-gpu machine

#206 momo1986 opened 1 year ago
5
add more comment in helloworld_ddp example

#205 ghostplant closed 1 year ago
0
Training with Data and Expert Parallelism

#204 santurini opened 1 year ago
5
INTERNAL ASSERT FAILED

#203 Qicheng-WANG opened 1 year ago
5
about compute_location and locations

#201 adverbial03 opened 1 year ago
1
add tutel.examples.helloworld_switch

#199 ghostplant closed 1 year ago
0
ImportError: cannot import name 'tutel_custom_kernel' from 'tutel.impls.jit_compiler'

#198 zhaojiancheng007 opened 1 year ago
12
[Bug]The function func_fwd is calculated inconsistent on the cpu and gpu

#197 starkhu closed 1 year ago
1
tutel/jit_kernels/sparse.py torch.float16 There is a bug in the calculation: the cuda calculation result is inconsistent with the CPU calculation result and the array is out of bounds

#196 WsqRichards1 opened 1 year ago
1
All2All precision always in fp32

#195 vchiley opened 1 year ago
1
add reset_parameters fn; updt .to() fn; enable device and dtype pass thru

#194 vchiley closed 11 months ago
1
Fix tutel compatibility in torch 2.0

#193 ghostplant closed 1 year ago
0
How the experts' gradients are handled under data parallelism?

#192 yzs981130 opened 1 year ago
1
removed logit_scale without device casting

#191 Harsh-Sensei closed 1 year ago
1
RuntimeError: No such operator tutel_ops::cumsum

#190 sharkdrop opened 1 year ago
10
[installation errors] fatal error: nccl.h: No such file or directory

#189 qianyuzqy opened 1 year ago
1