xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
76 stars 17 forks source link

Implement new tensor parallelism technique #17

Open xrsrke opened 10 months ago

xrsrke commented 10 months ago

There have been new tensor parallelism techniques that improve over Megatron's tensor parallelism. We consider implementing either 2D, 2.5D or 3D tensor parallelism.

Reading