xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
77 stars 17 forks source link

Implement new pipeline parallelism technique #7

Open xrsrke opened 10 months ago

xrsrke commented 10 months ago

The current pipeline parallelism implementation in PipeGoose supports GPipe, which isn't as efficient in hardware utilization compared to other pipeline parallelism techniques. We consider implementing new pipeline parallelism