Closed yata0 closed 4 months ago
仅支持用 llamafactory 加载 RM llamafactory-cli api --model_name_or_path xx --template xx --stage rm
llamafactory-cli api --model_name_or_path xx --template xx --stage rm
怎么对rm做评测呢?
llamafactory-cli api --model_name_or_path xx --template xx --stage rm
怎么对rm做评测呢?
@hiyouga
把训练脚本里的 do_train 改成 do_eval
我知道了,修改yaml do_train: false do_eval: false do_predict: true adapter_name_or_path: 训练后的lora
奖励预测结果会在output_dir里
@hiyouga @xd2333 只输出了100个结果,为什么呢?数据中有1000条数据的。
命令: llamafactory-cli train /root/autodl-tmp/llm_prj/AdGen/config/reward_infer_model.yaml reward_infer_model.yaml文件内容:
model_name_or_path: /root/autodl-tmp/llm_prj/AdGen/reward_model/merge
stage: rm do_train: false do_eval: false do_predict: true
do_train: false do_eval: false do_predict: true
dataset: ad_dpo template: qwen cutoff_len: 1024 max_samples: 10000 overwrite_cache: true preprocessing_num_workers: 16
output_dir: /root/autodl-tmp/llm_prj/AdGen/reward_model/infer logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000
val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500
dataset: ad_dpo配置如下: "ad_dpo": { "file_name": "/root/autodl-tmp/llm_prj/AdGen/data/dpo/ad_dpo.jsonl", "ranking": true, "columns": { "prompt": "instruction", "chosen": "chosen", "rejected": "rejected" } }
输出:
@hiyouga @xd2333 只输出了100个结果,为什么呢?数据中有1000条数据的。
命令: llamafactory-cli train /root/autodl-tmp/llm_prj/AdGen/config/reward_infer_model.yaml reward_infer_model.yaml文件内容:
model
model_name_or_path: /root/autodl-tmp/llm_prj/AdGen/reward_model/merge
method
stage: rm do_train: false do_eval: false do_predict: true
do_train: false do_eval: false do_predict: true
dataset
dataset: ad_dpo template: qwen cutoff_len: 1024 max_samples: 10000 overwrite_cache: true preprocessing_num_workers: 16
output
output_dir: /root/autodl-tmp/llm_prj/AdGen/reward_model/infer logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true
train
per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000
eval
val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500
dataset: ad_dpo配置如下: "ad_dpo": { "file_name": "/root/autodl-tmp/llm_prj/AdGen/data/dpo/ad_dpo.jsonl", "ranking": true, "columns": { "prompt": "instruction", "chosen": "chosen", "rejected": "rejected" } }
输出:
设置eval_dataset: ad_dpo,删除val_size: 0.1、eval_strategy: steps、eval_steps: 500
仅支持用 llamafactory 加载 RM llamafactory-cli api --model_name_or_path xx --template xx --stage rm
请问请求多模态的标准的请求脚本可以提供一个 demo case么~ 试了半天不知道怎么拼接 message。。。
使用这个启动的:
llamafactory-cli api --stage rm --template qwen2_vl --model_name_or_path models/qwen2_vl_rm_lora_1027_3sets
或者使用 trl 库的话该怎么加载模型推理得到分数~
感谢感谢~
@hiyouga @xd2333
请问一下,训练reward model支持这样的数据格式吗 openbookqa,一个prompt+多个response,跟Instructgpt一样
Reminder
System Info
Reproduction
导出:lamafactory-cli export --model_name_or_path=“./save” --stage=rm --export_dir="./see12" --template=default
测试:
报错: v_head weight is found. This IS expected if you are not resuming PPO training
https://github.com/hiyouga/LLaMA-Factory/issues/4379#issue-2362864279
Expected behavior
No response
Others
No response