reward model训练的重载以及评测

yata0 commented 4 months ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

Platform: Linux-5.4.143.bsk.8-amd64-x86_64-with-glibc2.31
Python version: 3.10.13
PyTorch version: 2.2.2+cu121 (GPU)
Transformers version: 4.42.3
Datasets version: 2.18.0
Accelerate version: 0.32.1
PEFT version: 0.11.1
TRL version: 0.9.6
GPU type: Tesla V100-SXM2-32GB

Reproduction

导出：lamafactory-cli export --model_name_or_path=“./save” --stage=rm --export_dir="./see12" --template=default

测试：

from trl import AutoModelForCausalLMWithValueHead
model_path = "./see12"
model = AutoModelForCausalLMWithValueHead.from_pretrained(model_path, trust_remote_code=True)

报错： v_head weight is found. This IS expected if you are not resuming PPO training

https://github.com/hiyouga/LLaMA-Factory/issues/4379#issue-2362864279

Expected behavior

No response

Others

No response

hiyouga commented 4 months ago

仅支持用 llamafactory 加载 RM llamafactory-cli api --model_name_or_path xx --template xx --stage rm

yata0 commented 4 months ago

llamafactory-cli api --model_name_or_path xx --template xx --stage rm

怎么对rm做评测呢？

yata0 commented 4 months ago

llamafactory-cli api --model_name_or_path xx --template xx --stage rm

怎么对rm做评测呢？

@hiyouga

hiyouga commented 4 months ago

把训练脚本里的 do_train 改成 do_eval

xd2333 commented 4 months ago

我知道了，修改yaml do_train: false do_eval: false do_predict: true adapter_name_or_path: 训练后的lora

奖励预测结果会在output_dir里

bruceguo123 commented 4 months ago

@hiyouga @xd2333 只输出了100个结果，为什么呢？数据中有1000条数据的。

命令： llamafactory-cli train /root/autodl-tmp/llm_prj/AdGen/config/reward_infer_model.yaml reward_infer_model.yaml文件内容：

model

model_name_or_path: /root/autodl-tmp/llm_prj/AdGen/reward_model/merge

method

stage: rm do_train: false do_eval: false do_predict: true

do_train: false do_eval: false do_predict: true

dataset

dataset: ad_dpo template: qwen cutoff_len: 1024 max_samples: 10000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: /root/autodl-tmp/llm_prj/AdGen/reward_model/infer logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000

eval

val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500

dataset: ad_dpo配置如下： "ad_dpo": { "file_name": "/root/autodl-tmp/llm_prj/AdGen/data/dpo/ad_dpo.jsonl", "ranking": true, "columns": { "prompt": "instruction", "chosen": "chosen", "rejected": "rejected" } }

输出：

xd2333 commented 4 months ago

@hiyouga @xd2333 只输出了100个结果，为什么呢？数据中有1000条数据的。

命令： llamafactory-cli train /root/autodl-tmp/llm_prj/AdGen/config/reward_infer_model.yaml reward_infer_model.yaml文件内容：

model

model_name_or_path: /root/autodl-tmp/llm_prj/AdGen/reward_model/merge

method

stage: rm do_train: false do_eval: false do_predict: true

do_train: false do_eval: false do_predict: true

dataset

dataset: ad_dpo template: qwen cutoff_len: 1024 max_samples: 10000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: /root/autodl-tmp/llm_prj/AdGen/reward_model/infer logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000

eval

val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500

dataset: ad_dpo配置如下： "ad_dpo": { "file_name": "/root/autodl-tmp/llm_prj/AdGen/data/dpo/ad_dpo.jsonl", "ranking": true, "columns": { "prompt": "instruction", "chosen": "chosen", "rejected": "rejected" } }

输出：

设置eval_dataset: ad_dpo，删除val_size: 0.1、eval_strategy: steps、eval_steps: 500

rover5056 commented 1 month ago

仅支持用 llamafactory 加载 RM llamafactory-cli api --model_name_or_path xx --template xx --stage rm

请问请求多模态的标准的请求脚本可以提供一个 demo case么～试了半天不知道怎么拼接 message。。。使用这个启动的： llamafactory-cli api --stage rm --template qwen2_vl --model_name_or_path models/qwen2_vl_rm_lora_1027_3sets

或者使用 trl 库的话该怎么加载模型推理得到分数～

感谢感谢～

@hiyouga @xd2333

world2025 commented 3 weeks ago

请问一下，训练reward model支持这样的数据格式吗 openbookqa，一个prompt+多个response，跟Instructgpt一样

hiyouga / LLaMA-Factory