Overview.
Reinforcement Learning [slides] [lecture note] [Video (in Chinese)].
Value-Based Learning [slides] [Video (in Chinese)].
Policy-Based Learning [slides] [Video (in Chinese)].
Actor-Critic Methods [slides] [Video (in Chinese)].
AlphaGo [slides] [Video (in Chinese)].
TD Learning.
Sarsa [slides] [Video (in Chinese)].
Q-learning [slides] [Video (in Chinese)].
Multi-Step TD Target [slides] [Video (in Chinese)].
Advanced Topics on Value-Based Learning.
Experience Replay (ER) & Prioritized ER [slides] [Video (in Chinese)].
Overestimation, Target Network, & Double DQN [slides] [Video (in Chinese)].
Dueling Networks [slides] [Video (in Chinese)].
Policy Gradient with Baseline.
Policy Gradient with Baseline [slides] [Video (in Chinese)].
REINFORCE with Baseline [slides] [Video (in Chinese)].
Advantage Actor-Critic (A2C) [slides] [Video (in Chinese)].
REINFORCE versus A2C [slides] [Video (in Chinese)].
Advanced Topics on Policy-Based Learning.
Trust-Region Policy Optimization (TRPO) [slides] [Video (in Chinese)].
Partial Observation and RNNs.
Dealing with Continuous Action Space.
Discrete versus Continuous Control [slides] [Video (in Chinese)].
Deterministic Policy Gradient (DPG) for Continuous Control [slides] [Video (in Chinese)].
Stochastic Policy Gradient for Continuous Control [slides] [Video (in Chinese)].
Multi-Agent Reinforcement Learning.
Basics and Challenges [slides] [Video (in Chinese)].
Centralized VS Decentralized [slides] [Video (in Chinese)].
Imitation Learning.
Inverse Reinforcement Learning.
Generative Adversarial Imitation Learning (GAIL).