Add an option to the RLOOTrainer that enables the use of string-based reward models, such as BLEU and Levenshtein distance, for evaluating model outputs.
Motivation
Currently, the reward_model in RLOOTrainer accepts tensor inputs only, limiting the ability to use string-based metrics for reward model. Incorporating string comparison metrics would allow users to leverage a broader range of string similarity measures.
Your contribution
I am open to collaborating with the community to implement this feature!
Feature request
Add an option to the RLOOTrainer that enables the use of string-based reward models, such as BLEU and Levenshtein distance, for evaluating model outputs.
Motivation
Currently, the reward_model in RLOOTrainer accepts tensor inputs only, limiting the ability to use string-based metrics for reward model. Incorporating string comparison metrics would allow users to leverage a broader range of string similarity measures.
Your contribution
I am open to collaborating with the community to implement this feature!