Run reward model train example failed

dhcode-cpp commented 9 months ago

environment: ModelArts Ascend 910A(32GB) x 8

需求补充reward model模型完整训练的文档。

以下为调试reward model过程遇到的问题

问题1:按照官方教程，在mindrlhf工程里没有run_mindformer.py 程序

examples/reward_model_train_tutorial/README.md:

示例命令如下，将会执行一个基于12层的GPT2模型的reward model的训练
python run_mindformer.py --config ../configs/gpt2/run_reward_gpt2.yaml \
                      --run_mode train \
                      --device_target Ascend

问题2: 没有GPT2Reward model

查找mindformers库和mindrlhf均无GPT2RewardModel

- model_config/gpt.yaml 中

```yaml

model:

arch:

# type: GPT2LMHeadModel

type: GPT2RewardModel

```

2. LLaMA2 Reward Model 训练

下载数据

git clone http://www.modelscope.cn/datasets/damo/CValues-Comparison.git

下载模型

import mindspore
from mindformers import AutoConfig, AutoModel, AutoTokenizer

# 指定图模式，指定使用训练卡id
mindspore.set_context(mode=0, device_id=0)

tokenizer = AutoTokenizer.from_pretrained('llama2_7b')

# model的实例化有以下两种方式，选择其中一种进行实例化即可
# 1. 直接根据默认配置实例化
model = AutoModel.from_pretrained('llama2_7b')
# 2. 自定义修改配置后实例化
config = AutoConfig.from_pretrained('llama2_7b')
config.use_past = True                  # 此处修改默认配置，开启增量推理能够加速推理性能
# config.xxx = xxx                      # 根据需求自定义修改其余模型配置
model = AutoModel.from_config(config)   # 从自定义配置项中实例化模型

inputs = tokenizer("I love Beijing, because")["input_ids"]
# 首次调用model.generate()进行推理将包含图编译时间，推理性能显示不准确，多次重复调用以获取准确的推理性能
outputs = model.generate(inputs, max_new_tokens=30, do_sample=False)
response = tokenizer.decode(outputs)
print(response)
# ['<s>I love Beijing, because it’s a city that is constantly changing. I have been living here for 10 years and I have seen the city change so much.I']

处理数据

--train.jsonl无数据，test.jsonl有数据

python ./examples/reward_model_train_tutorial/cvalues_comparison.py --model /home/ma-user/work/mindrlhf/model/checkpoint_download/llama2 --src_file=./data/CValues-Comparison/test_small.jsonl --dst_file=./data/reward_data.mindrecord --seq_length=512

运行时，模型文件缺失tokenizer_config.json

2024-01-14 21:40:10,204 - mindformers[base_tokenizer.py:1988] - WARNING - Can't find the tokenizer_config.json in the file_dict. The content of file_dict is : {}

手动拉入tokenizer_config.json，生成出训练数据集

运行train训练

bash ./scripts/run_distribute_reward.sh 'python ./examples/reward_model_train_tutorial/reward_train.py --run_mode=train --use_parallel True --config /home/ma-user/work/mindrlhf/model_configs/llama2_config/run_llama_2_7b_rm.yaml --train_dataset ./data/reward_data.mindrecord' ./jobstart_hccl.json [0,8] 8

运行到这里不再执行，无法进一步训练

2024-01-14 22:21:19,893 - mindformers[utils.py:336] - INFO - .........Building model.........
[WARNING] MD(27915,ffffba771010,python):2024-01-14-22:21:19.911.955 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:163] ~DataQueueOp] 
preprocess_batch: 100;
batch_queue: 1, 1, 1, 1, 1, 1, 1, 1, 1, 1;
            push_start_time -> push_end_time
2024-01-14-22:21:17.857.000 -> 2024-01-14-22:21:17.857.242
2024-01-14-22:21:17.857.349 -> 2024-01-14-22:21:17.857.577
2024-01-14-22:21:17.859.719 -> 2024-01-14-22:21:17.860.065
2024-01-14-22:21:17.860.226 -> 2024-01-14-22:21:17.860.449
2024-01-14-22:21:17.860.592 -> 2024-01-14-22:21:17.860.806
2024-01-14-22:21:17.860.972 -> 2024-01-14-22:21:17.861.187
2024-01-14-22:21:17.861.321 -> 2024-01-14-22:21:17.861.538
2024-01-14-22:21:17.861.701 -> 2024-01-14-22:21:17.861.922
2024-01-14-22:21:17.862.068 -> 2024-01-14-22:21:17.862.284
2024-01-14-22:21:17.862.383 -> 2024-01-14-22:21:17.862.594
For more details, please refer to the FAQ at https://www.mindspore.cn/docs/en/master/faq/data_processing.html.
[WARNING] MD(27915,ffffba771010,python):2024-01-14-22:21:20.153.132 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:163] ~DataQueueOp] 
preprocess_batch: 0;
batch_queue: 0;
            push_start_time -> push_end_time

ChessQian commented 9 months ago

问题1:按照官方教程，在mindrlhf工程里没有run_mindformer.py 程序解决方案：库上的教程目前是基于mindformers改动网络，然后在mindformers中运行的，所以该文件在mf中。问题2: 没有GPT2Reward model 解决方案：参考mindformers中https://gitee.com/mindspore/mindformers/blob/dev/mindformers/models/bloom/bloom_reward.py的实现，可以修改gpt2模型问题3：运行时，模型文件缺失tokenizer_config.json；解决方案：需要用户手动下载对应模型的tokenizer.model和json 问题4：运行到这里不再执行，无法进一步训练解决方案：看你的报错是mindrecord中没有数据，可以开info日志，定位一下

ChessQian commented 9 months ago

这是最新的教程，可以试一下：https://github.com/mindspore-lab/mindrlhf/pull/58

dhcode-cpp commented 8 months ago

感谢 llama2 的 reward model 训练运行成功。

mindspore-lab / mindrlhf

Run reward model train example failed #57

2. LLaMA2 Reward Model 训练