[INFO|2024-11-20 17:36:10] modeling_utils.py:1670 >> Instantiating Qwen2VisionTransformerPretrainedModel model under default dtype torch.bfloat16.
[WARNING|2024-11-20 17:36:10] logging.py:168 >> Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
[INFO|2024-11-20 17:36:14] modeling_utils.py:4800 >> All model checkpoint weights were used when initializing Qwen2VLForConditionalGeneration.
[INFO|2024-11-20 17:36:14] modeling_utils.py:4808 >> All the weights of Qwen2VLForConditionalGeneration were initialized from the model checkpoint at C:\Users\PC.cache\huggingface\hub\models--Qwen--Qwen2-VL-2B-Instruct\snapshots\aca78372505e6cb469c4fa6a35c60265b00ff5a4. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2VLForConditionalGeneration for predictions without further training.
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
[INFO|2024-11-20 17:36:10] modeling_utils.py:3934 >> loading weights file C:\Users\PC.cache\huggingface\hub\models--Qwen--Qwen2-VL-2B-Instruct\snapshots\aca78372505e6cb469c4fa6a35c60265b00ff5a4\model.safetensors.index.json
[INFO|2024-11-20 17:36:10] modeling_utils.py:1670 >> Instantiating Qwen2VLForConditionalGeneration model under default dtype torch.bfloat16.
[INFO|2024-11-20 17:36:10] configuration_utils.py:1096 >> Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 }
[INFO|2024-11-20 17:36:10] modeling_utils.py:1670 >> Instantiating Qwen2VisionTransformerPretrainedModel model under default dtype torch.bfloat16.
[WARNING|2024-11-20 17:36:10] logging.py:168 >> Qwen2VLRotaryEmbedding can now be fully parameterized by passing the model config through the config argument. All other arguments will be removed in v4.46
[INFO|2024-11-20 17:36:14] modeling_utils.py:4800 >> All model checkpoint weights were used when initializing Qwen2VLForConditionalGeneration.
[INFO|2024-11-20 17:36:14] modeling_utils.py:4808 >> All the weights of Qwen2VLForConditionalGeneration were initialized from the model checkpoint at C:\Users\PC.cache\huggingface\hub\models--Qwen--Qwen2-VL-2B-Instruct\snapshots\aca78372505e6cb469c4fa6a35c60265b00ff5a4. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2VLForConditionalGeneration for predictions without further training.
[INFO|2024-11-20 17:36:14] configuration_utils.py:1049 >> loading configuration file C:\Users\PC.cache\huggingface\hub\models--Qwen--Qwen2-VL-2B-Instruct\snapshots\aca78372505e6cb469c4fa6a35c60265b00ff5a4\generation_config.json
[INFO|2024-11-20 17:36:14] configuration_utils.py:1096 >> Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "temperature": 0.01, "top_k": 1, "top_p": 0.001 }
[INFO|2024-11-20 17:36:14] logging.py:157 >> Gradient checkpointing enabled.
[INFO|2024-11-20 17:36:14] logging.py:157 >> Using FlashAttention-2 for faster training and inference.
[INFO|2024-11-20 17:36:14] logging.py:157 >> Upcasting trainable params to float32.
[INFO|2024-11-20 17:36:14] logging.py:157 >> Fine-tuning method: LoRA
[INFO|2024-11-20 17:36:14] logging.py:157 >> Found linear modules: v_proj,k_proj,q_proj,o_proj,gate_proj,up_proj,down_proj
[INFO|2024-11-20 17:36:14] logging.py:157 >> trainable params: 9,232,384 || all params: 2,218,217,984 || trainable%: 0.4162
[INFO|2024-11-20 17:36:14] trainer.py:698 >> Using auto half precision backend
[INFO|2024-11-20 17:36:14] trainer.py:2313 >> Running training
[INFO|2024-11-20 17:36:14] trainer.py:2314 >> Num examples = 15
[INFO|2024-11-20 17:36:14] trainer.py:2315 >> Num Epochs = 100
[INFO|2024-11-20 17:36:14] trainer.py:2316 >> Instantaneous batch size per device = 2
[INFO|2024-11-20 17:36:14] trainer.py:2319 >> Total train batch size (w. parallel, distributed & accumulation) = 16
[INFO|2024-11-20 17:36:14] trainer.py:2320 >> Gradient Accumulation steps = 8
[INFO|2024-11-20 17:36:14] trainer.py:2321 >> Total optimization steps = 100
[INFO|2024-11-20 17:36:14] trainer.py:2322 >> Number of trainable parameters = 9,232,384
[INFO|2024-11-20 17:36:38] logging.py:157 >> {'loss': 0.0000, 'learning_rate': 4.9692e-05, 'epoch': 5.00}
[INFO|2024-11-20 17:37:02] logging.py:157 >> {'loss': 0.0000, 'learning_rate': 4.8776e-05, 'epoch': 10.00}
[INFO|2024-11-20 17:37:27] logging.py:157 >> {'loss': 0.0000, 'learning_rate': 4.7275e-05, 'epoch': 15.00}
Expected behavior
No response
Others
训练指令为: llamafactory-cli train
--stage sft
--do_train True--model_name_or_path C:\Users\PC\.cache\huggingface\hub\models--Qwen--Qwen2-VL-2B-Instruct\snapshots\aca78372505e6cb469c4fa6a35c60265b00ff5a4
--preprocessing_num_workers 16--finetuning_type lora
--template qwen2_vl--flash_attn fa2
--dataset_dir data--dataset mllm_demo
--cutoff_len 2048--learning_rate 5e-05
--num_train_epochs 100.0--max_samples 100000
--per_device_train_batch_size 2--gradient_accumulation_steps 8
--lr_scheduler_type cosine--max_grad_norm 1.0
--logging_steps 5--save_steps 100
--warmup_steps 0--packing False
--report_to none--output_dir saves\Qwen2-VL-2B-Instruct\lora\train_2024-11-20-17-41-13
--bf16 True--plot_loss True
--ddp_timeout 180000000--optim adamw_torch
--lora_rank 8--lora_alpha 16
--lora_dropout 0 ` --lora_target all