xrsrke pipegoose issues

xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*

MIT License

77 stars 17 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Mod map

#63 yugen-ok opened 9 months ago
0
[Readme] Add contributing guideline

#62 xrsrke closed 9 months ago
0
Multimodal MoE

#61 xrsrke opened 9 months ago
0
Distributed CLIP

#60 xrsrke opened 9 months ago
0
DiLoCo replication (DiLoCo: Distributed Low-Communication Training of Language Models)

#59 xrsrke opened 9 months ago
0
[Readme] Fix title

#58 xrsrke closed 9 months ago
0
[Readme] Change installation from git

#57 xrsrke closed 9 months ago
0
Feature/moe

#56 xrsrke closed 9 months ago
0
Feature/moe

#55 xrsrke closed 9 months ago
0
FP8 linear

#54 3outeille opened 9 months ago
0
[Feature] Add retrieving auxiliary and Z losses from ExpertLoss

#53 xrsrke closed 9 months ago
0
Making pipeline parallelism compatible with `transformers`

#52 xrsrke opened 9 months ago
0
[BUG] Fix the bug where tokens can't be dispatched when the input has…

#51 xrsrke closed 9 months ago
0
Port cuda kernels

#50 3outeille opened 9 months ago
0
[Feature] Add ExpertParallel with Top1 routing

#49 xrsrke closed 9 months ago
0
[Refactor] Use small bloom model in model partitioning's test

#48 xrsrke closed 9 months ago
0
[Refactor] Remove sample input in model partitioning for

#47 xrsrke closed 9 months ago
0
[Refactor] Apply pre-commit to model partitioner

#46 xrsrke closed 9 months ago
0
End-to-end FP8 training

#45 xrsrke opened 9 months ago
1
[Fix] Name generalization of transformer blocks

#44 abourramouss closed 9 months ago
0
Add expert loss function

#43 danielgrittner closed 9 months ago
0
[Feature] support the forward pass of automatic pipeline parallelism for 🤗 transformers

#42 xrsrke closed 9 months ago
0
[Bug Fix] Balance transformer blocks across shards

#41 abourramouss closed 9 months ago
0
Automatic module mapping using torch.fx

#40 xrsrke opened 9 months ago
3
[Feature] Add Expert Parallel

#39 xrsrke closed 9 months ago
0
Bug/fix hybrid tp dp

#38 3outeille opened 10 months ago
1
Tensor Parallelism

#37 3outeille opened 10 months ago
0
Issue #10: Kernel Fusion using torch.jit

#36 sami-bg opened 10 months ago
0
Logger branch

#35 KevorkSulahian opened 10 months ago
2
Deparallelize pipeline parallelism

#34 xrsrke closed 9 months ago
0
Distributed Logger

#33 xrsrke opened 10 months ago
1
WIP: Starting cuda setup

#32 isamu-isozaki opened 10 months ago
3
Add setup py

#31 isamu-isozaki opened 10 months ago
0
[FEATURE] Add demo APIs for Expert Parallel (still in progress)

#30 xrsrke closed 10 months ago
0
Save and load checkpoints

#29 xrsrke opened 10 months ago
0
Model partitioning

#28 abourramouss closed 9 months ago
0
Support parallelizing arbitrary transformer torch modules

#27 xrsrke closed 10 months ago
0
Support TPU

#26 xrsrke closed 10 months ago
0
Lazy initialization of massive models

#25 xrsrke opened 10 months ago
1
Checkpointing

#24 xrsrke closed 10 months ago
4
WIP: Trainer

#23 isamu-isozaki opened 10 months ago
7
Sequence Parallelism

#22 xrsrke opened 10 months ago
1
Callbacks for Distributed Optimizer

#21 xrsrke closed 9 months ago
0
ZeRO-1

#20 xrsrke closed 10 months ago
0
Mixture of Experts

#19 xrsrke opened 10 months ago
0
Trainer

#18 xrsrke opened 10 months ago
4
Implement new tensor parallelism technique

#17 xrsrke opened 10 months ago
0
Setup documentation

#16 xrsrke opened 10 months ago
0
Reproducible in 3D Parallelism

#15 xrsrke closed 10 months ago
0
Mixed precision training in FP16

#14 xrsrke opened 10 months ago
0