modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.47k stars 298 forks source link

qwen2 vl 72B 微调报错 #2068

Open zhangfan-algo opened 10 hours ago

zhangfan-algo commented 10 hours ago

Describe the bug image

SIZE_FACTOR=8 MAX_PIXELS=602112 torchrun --nproc_per_node ${num_gpu_per_node} --master_port $MASTER_PORT --master_addr $MASTER_ADDR --node_rank $RANK --nnodes $WORLD_SIZE examples/pytorch/llm/llm_sft.py \ --model_cache_dir Qwen2-VL-72B-Instruct \ --model_type qwen2-vl-72b-instruct \ --sft_type full \ --freeze_vit true \ --tuner_backend swift \ --template_type AUTO \ --output_dir output/zero-homework-correction-0830 \ --ddp_backend nccl \ --custom_train_dataset_path homework_correction_train3.jsonl \ --dataset_test_ratio 0.01 \ --self_cognition_sample -1 \ --preprocess_num_proc 60 \ --dataloader_num_workers 60 \ --train_dataset_sample -1 \ --dataset_test_ratio 0.01 \ --save_strategy epoch \ --lr_scheduler_type cosine \ --save_total_limit 5 \ --num_train_epochs 5 \ --eval_steps 50 \ --logging_steps 10 \ --max_length 2048 \ --check_dataset_strategy warning \ --gradient_checkpointing true \ --batch_size 1 \ --gradient_accumulation_steps 8 \ --deepspeed_config_path ds_z3_config.json \ --weight_decay 0.01 \ --learning_rate 1e-4 \ --max_grad_norm 0.5 \ --warmup_ratio 0.03 \ --use_flash_attn false \ --save_only_model false \ --save_on_each_node false \ --lazy_tokenize true \ --neftune_noise_alpha 10 \ --dtype AUTO

Jintao-Huang commented 10 hours ago

pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830

zhangfan-algo commented 10 hours ago

pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830

就是按照这个版本安装完后报错的

Jintao-Huang commented 10 hours ago

你先卸载再安装