Open thaumstrial opened 1 year ago
btw, it's better to random shuffle the dataset or models will be overfitting.
Thanks for your feedback.We have changed our rm training code a lot last week, and it will be release soon in this week.
btw, it's better to random shuffle the dataset or models will be overfitting.
The rm finetuning on these datasets is set to 1 epoch refer to rlhf papers. Anyway, shuffling dataset is necessary, and we will fix it in our coming pr.
Describe the feature
The following datasets were not found to be supported in the readme for training the reward model
openai/summarize_from_feedback openai/webgpt_comparisons Dahoas/instruct-synthetic-prompt-responses
Are there any plans for that? I have implemented these locally, whether the project needs this code or not?
And at the reward_dataset Special tokens are directly added to the code, which lacks generality. We may can provide custom added special tokens to adapt different tokenizers.
And also we will support these datasets soon.
@thaumstrial Could you share your code?,thank you very much,the reward dataset seems not fit task excluding chat
We will further update it recently, welcome to stay tuned!
Describe the feature
The following datasets were not found to be supported in the readme for training the reward model
openai/summarize_from_feedback openai/webgpt_comparisons Dahoas/instruct-synthetic-prompt-responses
Are there any plans for that? I have implemented these locally, whether the project needs this code or not?
And at the reward_dataset Special tokens are directly added to the code, which lacks generality. We may can provide custom added special tokens to adapt different tokenizers.