xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
76 stars 17 forks source link

Tensor Parallelism #37

Open 3outeille opened 9 months ago

3outeille commented 9 months ago

As described in the README, a bug was found when performing Tensor Parallelism (see wandb log). The following PR #38 fixes it