Open zsworld6 opened 1 month ago
llamafactory
llamafactory-cli train --stage dpo --do_train True --model_name_or_path --preprocessing_num_workers 16 --finetuning_type full --template llava --flash_attn auto --dataset_dir data --dataset --cutoff_len 1024 --learning_rate 5e-07 --num_train_epochs 3.0 --max_samples 100000 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --optim adamw_torch --packing False --report_to wandb --output_dir --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --lora_rank 8 --lora_alpha 16 --lora_dropout 0 --lora_target all --pref_beta 0.1 --pref_ftx 0 --pref_loss sigmoid --deepspeed cache/ds_z3_config.json
No response
0%| | 0/840 [00:00<?, ?it/s]
一直卡在这里动不了,然后被中断了
@zsworld6 did you solve this issue ?
Not yet
请问现在解决了吗?
没有
Try change “--deepspeed cache/ds_z3_config.json” to “--deepspeed cache/ds_z0_config.json”
Reminder
System Info
llamafactory
version: 0.9.1.dev0Reproduction
llamafactory-cli train --stage dpo --do_train True --model_name_or_path --preprocessing_num_workers 16 --finetuning_type full --template llava --flash_attn auto --dataset_dir data --dataset --cutoff_len 1024 --learning_rate 5e-07 --num_train_epochs 3.0 --max_samples 100000 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --optim adamw_torch --packing False --report_to wandb --output_dir --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --lora_rank 8 --lora_alpha 16 --lora_dropout 0 --lora_target all --pref_beta 0.1 --pref_ftx 0 --pref_loss sigmoid --deepspeed cache/ds_z3_config.json
Expected behavior
No response
Others
0%| | 0/840 [00:00<?, ?it/s]
一直卡在这里动不了,然后被中断了