CUDA OOM with DeepSpeed stage 3 for 11B T5 model

xlang-ai / UnifiedSKG

[EMNLP 2022] Unifying and multi-tasking structured knowledge grounding with language models

https://arxiv.org/abs/2201.05966

Apache License 2.0

550 stars 58 forks source link

CUDA OOM with DeepSpeed stage 3 for 11B T5 model #38

Closed SOUMAJYOTI closed 1 year ago

SOUMAJYOTI commented 1 year ago

I wonder if you guys have tried training the T5 11B param model on a single node with 8GPUs for the single task full finetuning case? I have not been able to get past the CUDA OOM issue with this repo codebase even with setting per device batch size to 1 for training and eval with p4d.24xlarge machine having 8 GPUs.

ChenWu98 commented 1 year ago

Hi, we haven't tried T5 11B since running them was out of our budget.