hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
34.98k stars 4.32k forks source link

LLaVA_dpo跑不了 #5812

Open zsworld6 opened 1 month ago

zsworld6 commented 1 month ago

Reminder

System Info

Reproduction

llamafactory-cli train --stage dpo --do_train True --model_name_or_path --preprocessing_num_workers 16 --finetuning_type full --template llava --flash_attn auto --dataset_dir data --dataset --cutoff_len 1024 --learning_rate 5e-07 --num_train_epochs 3.0 --max_samples 100000 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --max_grad_norm 1.0 --logging_steps 5 --save_steps 100 --warmup_steps 0 --optim adamw_torch --packing False --report_to wandb --output_dir --bf16 True --plot_loss True --ddp_timeout 180000000 --include_num_input_tokens_seen True --lora_rank 8 --lora_alpha 16 --lora_dropout 0 --lora_target all --pref_beta 0.1 --pref_ftx 0 --pref_loss sigmoid --deepspeed cache/ds_z3_config.json

Expected behavior

No response

Others

0%| | 0/840 [00:00<?, ?it/s]

一直卡在这里动不了,然后被中断了

NathanaelTamirat commented 1 month ago

@zsworld6 did you solve this issue ?

zsworld6 commented 1 month ago

@zsworld6 did you solve this issue ?

Not yet

alexlai2860 commented 1 month ago

请问现在解决了吗?

zsworld6 commented 1 month ago

请问现在解决了吗?

没有

zuojie2024 commented 6 days ago

Try change “--deepspeed cache/ds_z3_config.json” to “--deepspeed cache/ds_z0_config.json”