pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.01k stars 269 forks source link

[Feature Request] multi-turn reward for RLHF #2271

Open vmoens opened 4 days ago

vmoens commented 4 days ago

Implement rewards as proposed in https://arxiv.org/pdf/2405.14655