Closed ahtamjan closed 2 years ago
Hi @ahtamjan, I would advise you to read this very complete document that @stas00 created: Transformers docs: Performance and scalability
Probably head directly to Deepspeed docs: https://huggingface.co/docs/transformers/master/en/main_classes/deepspeed#trainer-deepspeed-integration
I will update the performance doc to make the link as it mentions it but isn't linking to it directly.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I converted XLMR-xxl from fairseq to transformers, and the model is too large to load in one GPU (the model is about 40g). I have 4 V100-32G gpu and how can I fix this problem ? How can I split and load the model to 2 gpus?