hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
35.3k stars 4.35k forks source link

reward model训练的重载以及评测 #4743

Closed yata0 closed 4 months ago

yata0 commented 4 months ago

Reminder

System Info

Reproduction

  1. 导出:lamafactory-cli export --model_name_or_path=“./save” --stage=rm --export_dir="./see12" --template=default image

  2. 测试:

    from trl import AutoModelForCausalLMWithValueHead
    model_path = "./see12"
    model = AutoModelForCausalLMWithValueHead.from_pretrained(model_path, trust_remote_code=True)

    报错: v_head weight is found. This IS expected if you are not resuming PPO training

https://github.com/hiyouga/LLaMA-Factory/issues/4379#issue-2362864279

Expected behavior

No response

Others

No response

hiyouga commented 4 months ago

仅支持用 llamafactory 加载 RM llamafactory-cli api --model_name_or_path xx --template xx --stage rm

yata0 commented 4 months ago

llamafactory-cli api --model_name_or_path xx --template xx --stage rm

怎么对rm做评测呢?

yata0 commented 4 months ago

llamafactory-cli api --model_name_or_path xx --template xx --stage rm

怎么对rm做评测呢?

@hiyouga

hiyouga commented 4 months ago

把训练脚本里的 do_train 改成 do_eval

xd2333 commented 4 months ago

我知道了,修改yaml do_train: false do_eval: false do_predict: true adapter_name_or_path: 训练后的lora

奖励预测结果会在output_dir里

bruceguo123 commented 4 months ago

@hiyouga @xd2333 只输出了100个结果,为什么呢?数据中有1000条数据的。

命令: llamafactory-cli train /root/autodl-tmp/llm_prj/AdGen/config/reward_infer_model.yaml reward_infer_model.yaml文件内容:

model

model_name_or_path: /root/autodl-tmp/llm_prj/AdGen/reward_model/merge

method

stage: rm do_train: false do_eval: false do_predict: true

do_train: false do_eval: false do_predict: true

dataset

dataset: ad_dpo template: qwen cutoff_len: 1024 max_samples: 10000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: /root/autodl-tmp/llm_prj/AdGen/reward_model/infer logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000

eval

val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500

dataset: ad_dpo配置如下: "ad_dpo": { "file_name": "/root/autodl-tmp/llm_prj/AdGen/data/dpo/ad_dpo.jsonl", "ranking": true, "columns": { "prompt": "instruction", "chosen": "chosen", "rejected": "rejected" } }

输出: image

xd2333 commented 4 months ago

@hiyouga @xd2333 只输出了100个结果,为什么呢?数据中有1000条数据的。

命令: llamafactory-cli train /root/autodl-tmp/llm_prj/AdGen/config/reward_infer_model.yaml reward_infer_model.yaml文件内容:

model

model_name_or_path: /root/autodl-tmp/llm_prj/AdGen/reward_model/merge

method

stage: rm do_train: false do_eval: false do_predict: true

do_train: false do_eval: false do_predict: true

dataset

dataset: ad_dpo template: qwen cutoff_len: 1024 max_samples: 10000 overwrite_cache: true preprocessing_num_workers: 16

output

output_dir: /root/autodl-tmp/llm_prj/AdGen/reward_model/infer logging_steps: 10 save_steps: 500 plot_loss: true overwrite_output_dir: true

train

per_device_train_batch_size: 1 gradient_accumulation_steps: 8 learning_rate: 1.0e-4 num_train_epochs: 3.0 lr_scheduler_type: cosine warmup_ratio: 0.1 bf16: true ddp_timeout: 180000000

eval

val_size: 0.1 per_device_eval_batch_size: 1 eval_strategy: steps eval_steps: 500

dataset: ad_dpo配置如下: "ad_dpo": { "file_name": "/root/autodl-tmp/llm_prj/AdGen/data/dpo/ad_dpo.jsonl", "ranking": true, "columns": { "prompt": "instruction", "chosen": "chosen", "rejected": "rejected" } }

输出: image

设置eval_dataset: ad_dpo,删除val_size: 0.1、eval_strategy: steps、eval_steps: 500

rover5056 commented 1 month ago

仅支持用 llamafactory 加载 RM llamafactory-cli api --model_name_or_path xx --template xx --stage rm

请问请求多模态的标准的请求脚本可以提供一个 demo case么~ 试了半天不知道怎么拼接 message。。。 使用这个启动的: llamafactory-cli api --stage rm --template qwen2_vl --model_name_or_path models/qwen2_vl_rm_lora_1027_3sets

或者使用 trl 库的话该怎么加载模型推理得到分数~

感谢感谢~

@hiyouga @xd2333

world2025 commented 3 weeks ago

请问一下,训练reward model支持这样的数据格式吗 openbookqa,一个prompt+多个response,跟Instructgpt一样