huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.9k stars 1.25k forks source link

Multi dimensional Score possible? #270

Closed achibb closed 1 year ago

achibb commented 1 year ago

Hi there, First of all thanks for the great library! I was wondering If it is possible to parse a multidimensional Score? Example. I want to judge my model in 3 categories: 1) positivity, 2) grammar, 3) topic fits. I think it makes Sense to not only parse a weighted sum Score but to give it a [pos_score,gram_score,topic_score]. Is it possible? Thanks!

lvwerra commented 1 year ago

I don't think this is possible with PPO out of the box and creating a weighted sum might be your best option. I'd suspect that the model can learn to disentangle the different scores internally after seeing enough example.

Maybe @edbeeching or @natolambert know more on the topic of multi-reward RL.

natolambert commented 1 year ago

To start, adding them together is worth trying (may need to tune the weights a big for each term). Curious how it goes.

I'd say in the NLP domain, you're more likely to see success by trying to make something like RLAIF work than changing to a multi-objective problem (niche research still). Combine the reward model with asking a model "which of these do I need to improve".