ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 40+ MLLMs. (Qwen2, GLM4, Internlm2.5, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
2.24k
stars
215
forks
source link
SimPO不支持zero3_offload分布式训练 #1095
Open
zhangfan-algo opened 4 weeks ago
Describe the bug
Your hardware and system info
torchrun --nproc_per_node ${num_gpu_per_node} --master_port $MASTER_PORT --master_addr $MASTER_ADDR --node_rank $RANK --nnodes $WORLD_SIZE examples/pytorch/llm/llm_simpo.py \
--model_cache_dir /mnt/cluster/swift_0522/output/qwen1half5_1_8B_pt_full_0604/qwen1half-1_8b-chat/v0-20240604-110136/checkpoint-4060 \
--model_type qwen1half-1_8b-chat \
--sft_type full \
--tuner_backend swift \
--template_type AUTO \
--ddp_backend nccl \
--custom_train_dataset_path simPO_data_train.jsonl \
--output_dir simPO_data_train_qwen1half5_1_8B_simpo_full_0605 \
--preprocess_num_proc 60 \
--dataloader_num_workers 60 \
--train_dataset_sample -1 \
--evaluation_strategy steps \
--eval_steps 50 \
--eval_batch_size 1 \
--dataset_test_ratio 0.01 \
--max_length 19500 \
--lr_scheduler_type cosine \
--num_train_epochs 5 \
--save_total_limit 5 \
--save_strategy epoch \
--logging_steps 10 \
--batch_size 1 \
--check_dataset_strategy warning \
--gradient_checkpointing true \
--gradient_accumulation_steps 8 \
--weight_decay 0.01 \
--learning_rate 1e-5 \
--max_grad_norm 0.5 \
--warmup_ratio 0.03 \
--use_flash_attn true \
--push_to_hub false \
--lazy_tokenize true \
--deepspeed_config_path /mnt/cluster/zhangfan/study_info/LLaMA-Factory_0506/examples/deepspeed/ds_z3_offload_config.json \
--save_only_model true \
--save_on_each_node false \
--neftune_noise_alpha 5 \
--dtype AUTO