Open shubhamagarwal92 opened 6 months ago
Hi, I meet the same problem. Do you solve this problem?
I meet the same problem
If I use 2 H100 I can run the code but I get OOM. When I increase it to +2 GPUs the model duplicates on GPUs instead of sharding and gets stuck in Formatting inputs... Skip in lazy mode
Hi @haotian-liu !
Interesting work around LLaVa!
Issue:
I am trying to finetune LLaVa using 8 X H100.
When I try to use DeepSpeed Zero Stage 3, it seems that the model gets replicated on all the GPUs, instead of being sharded. I get OOM issues when finetuning model. I am trying to use a context length of 2048 and ViT with 336 resolution.
Could you please suggest what I might be doing wrong here?
Command:
When I run the model using
CUDA_VISIBLE_DEVICES=0 bash ./scripts/sample_stage3.sh
, the memory usage before training is:However, when I am using the stage 3 deepspeed, the GPU usage before training is
And the model gets OOM after this. Could you please suggest what flag we might need to change?