very large model on multi gpu

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

134.28k stars 26.85k forks source link

very large model on multi gpu #14985

Closed ahtamjan closed 2 years ago

ahtamjan commented 2 years ago

I converted XLMR-xxl from fairseq to transformers, and the model is too large to load in one GPU (the model is about 40g). I have 4 V100-32G gpu and how can I fix this problem ? How can I split and load the model to 2 gpus?

LysandreJik commented 2 years ago

Hi @ahtamjan, I would advise you to read this very complete document that @stas00 created: Transformers docs: Performance and scalability

stas00 commented 2 years ago

Probably head directly to Deepspeed docs: https://huggingface.co/docs/transformers/master/en/main_classes/deepspeed#trainer-deepspeed-integration

I will update the performance doc to make the link as it mentions it but isn't linking to it directly.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.