vwxyzjn / lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase
MIT License
145 stars 7 forks source link

get benchmarkr results with TRL's pipeline #21

Open vwxyzjn opened 1 year ago

vwxyzjn commented 1 year ago

This PR attempt to get some benchmark results with TRL's sentiment pipe instead of training a reward model.

vwxyzjn commented 12 months ago
image