martyput / MDP_book

95 stars 7 forks source link

This folder contains the latest drafts of selected chapters of the forthcoming book "Introduction to Markov Decision Processes" by Martin Puterman and Tim Chan. They are in varying degrees of completeness. Please share any comments on material or typos with the authors ASAP.

Book Objective

The intent of the book is to provide a rigorous yet accessible foundation to Markov decision processes (MDPs) and reinforcement learning (RL) by focussing on finite state and action models. The book is divided into three parts: Preliminaries, Classical Models, Reinforcement Learning. It is not intended to cover all the latest work on RL but to provide a sufficient foundation for the reader to jump into current research. The book also contains a chapter on partially observed MDPs (POMDPs) but will not cover continuous time models which at this time do not appear to be a major research focus.

Requisite Background

The book is aimed at fourth-year undergraduates and graduate students. We believe a suitable background to read this book includes linear algebra, Markov chains, linear regression and analysis. The reader should be comfortable with mathematical notation and a theorem-proof framework.

Features

All algorithms and results are illustrated with examples and applications. Each chapter contains a large number of fomulation, computation and theoretical problems and some bibliographical remarks. Code (mostly in R) we used for the examples will be posted in each chapter folder. The book uses the notation "cmax" and "arg cmax" to emphasize that when representing results in vector form, maximization is carried out componentwise. Moroever we introduce the concept of state-action value funcitons early on and use it throughout the book, especially in the simulation and reinforcement learning sections.

One convention we try to use consistently is that subscripts on states and actions refer to specific states and actions while superscripts refer to states and actions generated at different iterations of an algorithm or simulation.

Book Chapters

Part I - Preliminaries

Chapter 1 Introduction

Chapter 2 Foundations

Chapter 3 Applications

Part II - Classical Models

Chapter 4 Finite Horizon MDPs

Chapter 5 Infinite Horizon Discounted MDPS

Chapter 6 Infinite Horizon Total Reward MDPS

Chapter 7 Infinite Horizon Average Reward MDPS

Chapter 8 Partially Observed MDPS

Part III - Reinforcement Learning

Chapter 9 Approximations in Tabular Models

Chapter 10 Simulation in Tabular Models

Chapter 11 Reinforcement Learning

Chapter 12 Simulation with Function Approximation

Appendices (Markov Chains, Linear Programming)

Bibliography