请问下您有碰到过微调7B模型出现OOM吗？

zhangfaen / finetune-Qwen2-VL

MIT License

218 stars 21 forks source link

请问下您有碰到过微调7B模型出现OOM吗？ #15

Open junwenxiong opened 4 weeks ago

junwenxiong commented 4 weeks ago

您好，我用4张A100微调Qwen2-VL-7B-Instruct模型，但还是会出现OOM，整个代码库都没有改变呀，咋会出现这个情况呢？

zhangfaen commented 4 weeks ago

图片过大， batch size 过大等

junwenxiong commented 3 weeks ago

我用的是代码中数据，没有更换呀，有可能是flash-attention的问题吗，我用的是2.3.1的版本

rburchcp commented 2 weeks ago

I am also getting an out of memory error when trying to finetune Qwen2-VL-2B-Instruct on six A6000s (48GB x 6). I am using flash attention 2. My batch_size=1, min_pixels=256x28x28, and max_pixels=512x28x28. I am training on eight videos, though, which are 1920 x 1080 pixels and are eight seconds long.

zhangfaen commented 2 weeks ago

replace from torch.optim import AdamW with from torch.optim import SGD

rburchcp commented 2 weeks ago

That helped by allowing more batches to run than previously but I still eventually ran out of memory. Is there a way to lower the gpu memory utilization? See https://github.com/vllm-project/vllm/issues/2554

Perhaps gradient accumulation steps=1 will help?