issues
search
xrsrke
/
pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
77
stars
17
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Mod map
#63
yugen-ok
opened
9 months ago
0
[Readme] Add contributing guideline
#62
xrsrke
closed
9 months ago
0
Multimodal MoE
#61
xrsrke
opened
9 months ago
0
Distributed CLIP
#60
xrsrke
opened
9 months ago
0
DiLoCo replication (DiLoCo: Distributed Low-Communication Training of Language Models)
#59
xrsrke
opened
9 months ago
0
[Readme] Fix title
#58
xrsrke
closed
9 months ago
0
[Readme] Change installation from git
#57
xrsrke
closed
9 months ago
0
Feature/moe
#56
xrsrke
closed
9 months ago
0
Feature/moe
#55
xrsrke
closed
9 months ago
0
FP8 linear
#54
3outeille
opened
9 months ago
0
[Feature] Add retrieving auxiliary and Z losses from ExpertLoss
#53
xrsrke
closed
9 months ago
0
Making pipeline parallelism compatible with `transformers`
#52
xrsrke
opened
9 months ago
0
[BUG] Fix the bug where tokens can't be dispatched when the input has…
#51
xrsrke
closed
9 months ago
0
Port cuda kernels
#50
3outeille
opened
9 months ago
0
[Feature] Add ExpertParallel with Top1 routing
#49
xrsrke
closed
9 months ago
0
[Refactor] Use small bloom model in model partitioning's test
#48
xrsrke
closed
9 months ago
0
[Refactor] Remove sample input in model partitioning for
#47
xrsrke
closed
9 months ago
0
[Refactor] Apply pre-commit to model partitioner
#46
xrsrke
closed
9 months ago
0
End-to-end FP8 training
#45
xrsrke
opened
9 months ago
1
[Fix] Name generalization of transformer blocks
#44
abourramouss
closed
9 months ago
0
Add expert loss function
#43
danielgrittner
closed
9 months ago
0
[Feature] support the forward pass of automatic pipeline parallelism for 🤗 transformers
#42
xrsrke
closed
9 months ago
0
[Bug Fix] Balance transformer blocks across shards
#41
abourramouss
closed
9 months ago
0
Automatic module mapping using torch.fx
#40
xrsrke
opened
9 months ago
3
[Feature] Add Expert Parallel
#39
xrsrke
closed
9 months ago
0
Bug/fix hybrid tp dp
#38
3outeille
opened
10 months ago
1
Tensor Parallelism
#37
3outeille
opened
10 months ago
0
Issue #10: Kernel Fusion using torch.jit
#36
sami-bg
opened
10 months ago
0
Logger branch
#35
KevorkSulahian
opened
10 months ago
2
Deparallelize pipeline parallelism
#34
xrsrke
closed
9 months ago
0
Distributed Logger
#33
xrsrke
opened
10 months ago
1
WIP: Starting cuda setup
#32
isamu-isozaki
opened
10 months ago
3
Add setup py
#31
isamu-isozaki
opened
10 months ago
0
[FEATURE] Add demo APIs for Expert Parallel (still in progress)
#30
xrsrke
closed
10 months ago
0
Save and load checkpoints
#29
xrsrke
opened
10 months ago
0
Model partitioning
#28
abourramouss
closed
9 months ago
0
Support parallelizing arbitrary transformer torch modules
#27
xrsrke
closed
10 months ago
0
Support TPU
#26
xrsrke
closed
10 months ago
0
Lazy initialization of massive models
#25
xrsrke
opened
10 months ago
1
Checkpointing
#24
xrsrke
closed
10 months ago
4
WIP: Trainer
#23
isamu-isozaki
opened
10 months ago
7
Sequence Parallelism
#22
xrsrke
opened
10 months ago
1
Callbacks for Distributed Optimizer
#21
xrsrke
closed
9 months ago
0
ZeRO-1
#20
xrsrke
closed
10 months ago
0
Mixture of Experts
#19
xrsrke
opened
10 months ago
0
Trainer
#18
xrsrke
opened
10 months ago
4
Implement new tensor parallelism technique
#17
xrsrke
opened
10 months ago
0
Setup documentation
#16
xrsrke
opened
10 months ago
0
Reproducible in 3D Parallelism
#15
xrsrke
closed
10 months ago
0
Mixed precision training in FP16
#14
xrsrke
opened
10 months ago
0
Next