modelscope / swift

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 40+ MLLMs. (Qwen2, GLM4, Internlm2.5, Yi, Llama3, Llava, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://github.com/modelscope/swift/blob/main/docs/source/LLM/index.md
Apache License 2.0
2.24k stars 215 forks source link

SimPO不支持zero3_offload分布式训练 #1095

Open zhangfan-algo opened 4 weeks ago

zhangfan-algo commented 4 weeks ago

Describe the bug image Your hardware and system info torchrun --nproc_per_node ${num_gpu_per_node} --master_port $MASTER_PORT --master_addr $MASTER_ADDR --node_rank $RANK --nnodes $WORLD_SIZE examples/pytorch/llm/llm_simpo.py \ --model_cache_dir /mnt/cluster/swift_0522/output/qwen1half5_1_8B_pt_full_0604/qwen1half-1_8b-chat/v0-20240604-110136/checkpoint-4060 \ --model_type qwen1half-1_8b-chat \ --sft_type full \ --tuner_backend swift \ --template_type AUTO \ --ddp_backend nccl \ --custom_train_dataset_path simPO_data_train.jsonl \ --output_dir simPO_data_train_qwen1half5_1_8B_simpo_full_0605 \ --preprocess_num_proc 60 \ --dataloader_num_workers 60 \ --train_dataset_sample -1 \ --evaluation_strategy steps \ --eval_steps 50 \ --eval_batch_size 1 \ --dataset_test_ratio 0.01 \ --max_length 19500 \ --lr_scheduler_type cosine \ --num_train_epochs 5 \ --save_total_limit 5 \ --save_strategy epoch \ --logging_steps 10 \ --batch_size 1 \ --check_dataset_strategy warning \ --gradient_checkpointing true \ --gradient_accumulation_steps 8 \ --weight_decay 0.01 \ --learning_rate 1e-5 \ --max_grad_norm 0.5 \ --warmup_ratio 0.03 \ --use_flash_attn true \ --push_to_hub false \ --lazy_tokenize true \ --deepspeed_config_path /mnt/cluster/zhangfan/study_info/LLaMA-Factory_0506/examples/deepspeed/ds_z3_offload_config.json \ --save_only_model true \ --save_on_each_node false \ --neftune_noise_alpha 5 \ --dtype AUTO

hjh0119 commented 2 weeks ago

更新trl源码( 0.9.5.dev0), 应该可以了