xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
78 stars 18 forks source link

add test #3

Closed xrsrke closed 1 year ago