get benchmarkr results with TRL's pipeline

vwxyzjn / lm-human-preference-details

RLHF implementation details of OAI's 2019 codebase

MIT License

145 stars 7 forks source link

Open vwxyzjn opened 1 year ago

vwxyzjn commented 1 year ago

This PR attempt to get some benchmark results with TRL's sentiment pipe instead of training a reward model.

vwxyzjn commented 12 months ago