xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
76 stars 17 forks source link

Add setup py #31

Open isamu-isozaki opened 9 months ago

isamu-isozaki commented 9 months ago

Hi @xrsrke just adding setup.py for preparing for the cuda port