issues
search
xrsrke
/
pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
78
stars
18
forks
source link
add test
#3
Closed
xrsrke
closed
1 year ago