This folder contains the latest drafts of selected chapters of the forthcoming book "Introduction to Markov Decision Processes" by Martin Puterman and Tim Chan. They are in varying degrees of completeness. Please share any comments on material or typos with the authors ASAP.
The intent of the book is to provide a rigorous yet accessible foundation to Markov decision processes (MDPs) and reinforcement learning (RL) by focussing on finite state and action models. The book is divided into three parts: Preliminaries, Classical Models, Reinforcement Learning. It is not intended to cover all the latest work on RL but to provide a sufficient foundation for the reader to jump into current research. The book also contains a chapter on partially observed MDPs (POMDPs) but will not cover continuous time models which at this time do not appear to be a major research focus.
The book is aimed at fourth-year undergraduates and graduate students. We believe a suitable background to read this book includes linear algebra, Markov chains, linear regression and analysis. The reader should be comfortable with mathematical notation and a theorem-proof framework.
All algorithms and results are illustrated with examples and applications. Each chapter contains a large number of fomulation, computation and theoretical problems and some bibliographical remarks. Code (mostly in R) we used for the examples will be posted in each chapter folder. The book uses the notation "cmax" and "arg cmax" to emphasize that when representing results in vector form, maximization is carried out componentwise. Moroever we introduce the concept of state-action value funcitons early on and use it throughout the book, especially in the simulation and reinforcement learning sections.
One convention we try to use consistently is that subscripts on states and actions refer to specific states and actions while superscripts refer to states and actions generated at different iterations of an algorithm or simulation.
Part I - Preliminaries
Chapter 1 Introduction
Chapter 2 Foundations
Chapter 3 Applications
Part II - Classical Models
Chapter 4 Finite Horizon MDPs
Chapter 5 Infinite Horizon Discounted MDPS
Chapter 6 Infinite Horizon Total Reward MDPS
Chapter 7 Infinite Horizon Average Reward MDPS
Chapter 8 Partially Observed MDPS
Part III - Reinforcement Learning
Chapter 9 Approximations in Tabular Models
Chapter 10 Simulation in Tabular Models
Chapter 11 Reinforcement Learning
Chapter 12 Simulation with Function Approximation
Appendices (Markov Chains, Linear Programming)
Bibliography