Chapter Plans - Githubissues

natolambert / rlhf-book

Textbook on reinforcement learning from human feedback

https://rlhfbook.com/

MIT License

69 stars 7 forks source link

Chapter Plans #6

Open natolambert opened 4 months ago

natolambert commented 4 months ago

Here is a rough outline of what I would like to see in the book, and who will be writing it.

Introductions & History

Introduction
Economics, Psychology, Philosophy of preference, etc.: VNM Theory, Bradley Terry, Impossibility theorems, social choice, etc
Optimal Control, Deep RL, ML etc.
RLHF for LLM lit (pre chatgpt stuff), maybe summarize instrugpt

Links:

https://arxiv.org/abs/2310.13595

Problem Specification

Definitions, basic stuff, math
Preference data collection
Preference model training
KL constraints and other penalties

Policy Optimization

IFT / SFT / Chat Templates
Rejection Sampling / Best of N
PPO, REINFORCE, Policy Gradient
DPO (Eric, Archit, Rafael)
Other variants (short)

Advanced (optional)

CAI
Synthetic vs human data
Evaluation

Open Questions (TBD / optional)

Reward model over-optimization