请教DPO多轮对话的问题

chloefresh commented 6 months ago

尝试把多轮对话数据格式做成下面的格式用DPO代码跑了一下lora，merge之后，发现推理速度变慢，而且推理会输出重复的内容。代码部分只把"prompt": ["Question: " + question + "\n\nAnswer: " for question in examples["question"]]改成了"prompt": examples["question"],是不是还需要和多轮对话sft一样每轮对话结束后加结束符？

{"question": "\n\nHuman:你好\n\nAssistant:你好\n\nHuman:你好\n\nAssistant:", "response_chosen": "您好", "response_rejected": "您好，有什么可以帮您的吗"}

使用的参数是： CUDA_VISIBLE_DEVICES=4,5,6 python dpo_training.py \ --model_type baichuan \ --model_name_or_path 经过sft的base模型 \ --train_file_dir ./reward \ --validation_file_dir ./reward \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --do_train \ --do_eval \ --use_peft True \ --max_train_samples -1 \ --max_eval_samples -1 \ --max_steps 100 \ --eval_steps 20 \ --save_steps 50 \ --max_source_length 1024 \ --max_target_length 256 \ --output_dir outputs-dpo-v1 \ --target_modules all \ --lora_rank 8 \ --lora_alpha 16 \ --lora_dropout 0.05 \ --torch_dtype float16 \ --fp16 True \ --device_map auto \ --report_to tensorboard \ --remove_unused_columns False \ --gradient_checkpointing True \ --cache_dir ./cache \ --gradient_accumulation_steps 4

shibing624 commented 6 months ago

可以手动加结束符。

chloefresh commented 6 months ago

@shibing624 dpo训练完了后推理速度变慢了不少，请问可能是什么原因呢？

shibing624 commented 6 months ago

我没有感觉特别明显的区别。

shibing624 / MedicalGPT

请教DPO多轮对话的问题 #293