natolambert / rlhf-book

Textbook on reinforcement learning from human feedback
https://rlhfbook.com/
MIT License
69 stars 7 forks source link

Chapter Plans #6

Open natolambert opened 4 months ago

natolambert commented 4 months ago

Here is a rough outline of what I would like to see in the book, and who will be writing it.

Introductions & History

  1. Introduction
  2. Economics, Psychology, Philosophy of preference, etc.: VNM Theory, Bradley Terry, Impossibility theorems, social choice, etc
  3. Optimal Control, Deep RL, ML etc.
  4. RLHF for LLM lit (pre chatgpt stuff), maybe summarize instrugpt

Links:

Problem Specification

  1. Definitions, basic stuff, math
  2. Preference data collection
  3. Preference model training
  4. KL constraints and other penalties

Policy Optimization

  1. IFT / SFT / Chat Templates
  2. Rejection Sampling / Best of N
  3. PPO, REINFORCE, Policy Gradient
  4. DPO (Eric, Archit, Rafael)
  5. Other variants (short)

Advanced (optional)

  1. CAI
  2. Synthetic vs human data
  3. Evaluation

Open Questions (TBD / optional)

  1. Reward model over-optimization