[FEATURE]: Support more dataset and custom special token for reward-model training

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

https://www.colossalai.org

Apache License 2.0

38.64k stars 4.33k forks source link

[FEATURE]: Support more dataset and custom special token for reward-model training #3097

Open thaumstrial opened 1 year ago

thaumstrial commented 1 year ago

Describe the feature

The following datasets were not found to be supported in the readme for training the reward model

openai/summarize_from_feedback openai/webgpt_comparisons Dahoas/instruct-synthetic-prompt-responses

Are there any plans for that? I have implemented these locally, whether the project needs this code or not?

And at the reward_dataset Special tokens are directly added to the code, which lacks generality. We may can provide custom added special tokens to adapt different tokenizers.

thaumstrial commented 1 year ago

btw, it's better to random shuffle the dataset or models will be overfitting.

ht-zhou commented 1 year ago

Thanks for your feedback.We have changed our rm training code a lot last week, and it will be release soon in this week.

ht-zhou commented 1 year ago

btw, it's better to random shuffle the dataset or models will be overfitting.

The rm finetuning on these datasets is set to 1 epoch refer to rlhf papers. Anyway, shuffling dataset is necessary, and we will fix it in our coming pr.

ht-zhou commented 1 year ago

Describe the feature

The following datasets were not found to be supported in the readme for training the reward model

openai/summarize_from_feedback openai/webgpt_comparisons Dahoas/instruct-synthetic-prompt-responses

Are there any plans for that? I have implemented these locally, whether the project needs this code or not?

And at the reward_dataset Special tokens are directly added to the code, which lacks generality. We may can provide custom added special tokens to adapt different tokenizers.

And also we will support these datasets soon.

zhohuiluo commented 1 year ago

@thaumstrial Could you share your code?，thank you very much，the reward dataset seems not fit task excluding chat

binmakeswell commented 1 year ago

We will further update it recently, welcome to stay tuned!