I tried to do full fine-tuning on the Flan-T5-XL, but I have always faced the issue of OOM. I used 5 A5000 cards, each with 24GB, which should be acceptable in theory. However, I still have OOM. Do I have to use Deepspeed. In the explanation, I saw the word 'DS unload'. 'yes' means that Deepspeed was not used, right? Do any friends also conduct similar experiments? Can you tell me the reason?
I tried to do full fine-tuning on the Flan-T5-XL, but I have always faced the issue of OOM. I used 5 A5000 cards, each with 24GB, which should be acceptable in theory. However, I still have OOM. Do I have to use Deepspeed. In the explanation, I saw the word 'DS unload'. 'yes' means that Deepspeed was not used, right? Do any friends also conduct similar experiments? Can you tell me the reason?