Closed warm345 closed 1 month ago
+1
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft --model_type qwen2-vl-7b-instruct --dataset custom.json
I don't understand why it's lora, 7b and I get OOM, why is that?
我实验了一下,加入MAX_PIXELS=602112
这个参数后,可以正常训练,不加的话就会OOM
@warm345
Thanks. Am I right in understanding that if it is 602112, what is the resolution? And where can I find the default MAX_PIXEL in swift?
MAX_PIXELS 是指三通道加起来的总像素吗?还是仅仅指 w*h ?
我实验了一下,加入
MAX_PIXELS=602112
这个参数后,可以正常训练,不加的话就会OOM
您好,我用的是脚本训练,这个参数应该加在哪
[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.77 GiB. GPU 0 has a total capacity of 79.11 GiB of which 1.61 GiB is free. Process 977732 has 77.49 GiB memory in use. Of the allocated memory 75.16 GiB is allocated by PyTorch, and 1.27 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
实验了同参数下跑internvl2-8b没有问题