microsoft / TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications
MIT License
354 stars 31 forks source link

How to finetune with multi-gpus under data parallel setting? #167

Open kriskrisliu opened 3 months ago

kriskrisliu commented 3 months ago

Many thanks for sharing the amazing work!

I'm trying to finetune sliced 7b model on some large dataset with millions of samples. But the distribute-model seems to be model parallel. How can we finetune on the model with, let's say 8 gpus, under data parallel setting ?