xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
76 stars 17 forks source link

Model partitioning #28

Closed abourramouss closed 9 months ago

abourramouss commented 10 months ago

This is the first approach into partitioning models from hugging face, i have tried several approaches and this seems the most convinient, it flattens the list and assigns a range of indexes of the flattened model to the partitions.

Not finished yet.