Closed achibb closed 1 year ago
I don't think this is possible with PPO out of the box and creating a weighted sum might be your best option. I'd suspect that the model can learn to disentangle the different scores internally after seeing enough example.
Maybe @edbeeching or @natolambert know more on the topic of multi-reward RL.
To start, adding them together is worth trying (may need to tune the weights a big for each term). Curious how it goes.
I'd say in the NLP domain, you're more likely to see success by trying to make something like RLAIF work than changing to a multi-objective problem (niche research still). Combine the reward model with asking a model "which of these do I need to improve".
Hi there, First of all thanks for the great library! I was wondering If it is possible to parse a multidimensional Score? Example. I want to judge my model in 3 categories: 1) positivity, 2) grammar, 3) topic fits. I think it makes Sense to not only parse a weighted sum Score but to give it a [pos_score,gram_score,topic_score]. Is it possible? Thanks!