modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.16k stars 368 forks source link

Qwen2-VL-7B-Instruct训练爆显存 #2010

Closed warm345 closed 1 month ago

warm345 commented 2 months ago
MASTER_PORT=29510 \
CUDA_VISIBLE_DEVICES=0,1 \
swift sft \
    --model_type qwen2-vl-7b-instruct \
    --model_id_or_path $MODEL_PATH \
    --sft_type lora \
    --tuner_backend peft \
    --template_type AUTO \
    --dtype AUTO \
    --output_dir $OUTPUT_PATH \
    --dataset $DATA_PATH \
    --train_dataset_sample -1 \
    --dataset_test_ratio 0.05 \
    --num_train_epochs 1 \
    --max_length 1024 \
    --check_dataset_strategy warning \
    --lora_rank 16 \
    --lora_alpha 8 \
    --lora_dropout_p 0.05 \
    --lora_target_modules ALL \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0.1 \
    --learning_rate 2e-5 \
    --gradient_accumulation_steps 8 \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit -1 \
    --logging_steps 10 \
    --check_model_is_latest false \
    --use_flash_attn false &> "$OUTPUT_PATH/$SUFFIX.log" 

torch==2.4.0
transformers==4.45.0.dev0

[rank0]: torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.77 GiB. GPU 0 has a total capacity of 79.11 GiB of which 1.61 GiB is free. Process 977732 has 77.49 GiB memory in use. Of the allocated memory 75.16 GiB is allocated by PyTorch, and 1.27 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

实验了同参数下跑internvl2-8b没有问题

YoungjaeDev commented 2 months ago

+1

CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft --model_type qwen2-vl-7b-instruct --dataset custom.json

I don't understand why it's lora, 7b and I get OOM, why is that?

warm345 commented 2 months ago

我实验了一下,加入MAX_PIXELS=602112这个参数后,可以正常训练,不加的话就会OOM

YoungjaeDev commented 2 months ago

@warm345

Thanks. Am I right in understanding that if it is 602112, what is the resolution? And where can I find the default MAX_PIXEL in swift?

Jintao-Huang commented 2 months ago

https://github.com/modelscope/ms-swift/blob/2d1aba96281c8f646881427fa857388b07fdcbef/swift/llm/utils/vision_utils.py#L273

thesby commented 1 month ago

MAX_PIXELS 是指三通道加起来的总像素吗?还是仅仅指 w*h ?

youflyaway commented 1 month ago

我实验了一下,加入MAX_PIXELS=602112这个参数后,可以正常训练,不加的话就会OOM

您好,我用的是脚本训练,这个参数应该加在哪