Open vwxyzjn opened 1 year ago
This PR attempt to get some benchmark results with TRL's sentiment pipe instead of training a reward model.
This PR attempt to get some benchmark results with TRL's sentiment pipe instead of training a reward model.