Design Document for RLHF

Introduction

This project replicates the research paper, Deep Reinforcement Learning from Human Preferences (Christiano et al, 2017). We aim to:

Implement Reinforcement Learning from Human Feedback (RLHF) logic based on the paper's findings.
Develop a web application to facilitate human interaction with the RLHF system.
Publish the web application publicly to increase accessibility and interest in RLHF.

To achieve these goals, we will rely on the following technologies:

Tech Stack:

High-level design

The RL agent interacts with the MuJoCo environment.
The agent's actions and resulting states are visualized in the web application.
Users provide feedback based on the observed behavior.
Feedback data is collected and stored in Redis.
The reward model is trained using the feedback data.
The learned reward model provides guidance for the RL agent's policy updates.
The improved policy leads to better performance in the MuJoCo environment, showcased in the visualization.