Closed yamatokataoka closed 6 months ago
This project replicates the research paper, Deep Reinforcement Learning from Human Preferences (Christiano et al, 2017). We aim to:
To achieve these goals, we will rely on the following technologies:
Tech Stack:
High-level design
Design Document for RLHF
Introduction
This project replicates the research paper, Deep Reinforcement Learning from Human Preferences (Christiano et al, 2017). We aim to:
Implementation
To achieve these goals, we will rely on the following technologies:
Tech Stack:
High-level design
Data Flow: