some changes/ additions to ch 35 (RL) - Githubissues

probml / pml2-book

Probabilistic Machine Learning: Advanced Topics

MIT License

1.4k stars 119 forks source link

some changes/ additions to ch 35 (RL) #331

Closed murphyk closed 3 months ago

murphyk commented 3 months ago

Sec 35.3.5 (DDPG)

Currently the text just says "The DDPG algorithm of [Lil+16], which stands for “deep deterministic policy gradient”, uses the DQN method (Section 35.2.6) to update Q that is represented by deep neural networks. " We expand on this a little bit, since DDPG is quite widely used.

Sec 35.4 (MBRL)

Significant expansion and restructuring of the content
Sec 35.5.3 (deadly triad)
Added brief discussion of gradient TD and target networks to stabilize off-policy learning.

Sec 35.5.4 (new: offpolicy in practice)

add very short new section listing common off-policy methods, for ease of reference.

Sec 35.6 (control as inference)

Added clearer subsection headings, to add more structured
Moved the subsection on 'imitation learning' into its own sec 35.7.

Sec 35.7 (new: imitation learning)

This now contains the content that used to be in 35.6.3

Sec 35.8 (new: "Other topics in RL")

Added brief discussions of various topics, such as GVF, temporal abstraction (options), partial observability, reward functions (including shaping and hacking), and offline RL.

murphyk commented 3 months ago

To avoid too much divergence from the original text, I have rolled back these changes. This new content will be added to a new tutorial on RL that I am writing.