Implement Policyxxx - Githubissues

jsphon / reinforcement_learning

Python Package For Reinforcement Learning

0 stars 0 forks source link

Implement Policyxxx #12

Open jsphon opened 7 years ago

jsphon commented 7 years ago

Policy Evaluation Policy Improvement Policy Iteration

Value Iteration

jsphon commented 7 years ago

There are different versions of these algorithms. e.g.

Dynamic Programming
Monte Carlo
TD

The Dynamic Programming methods might not be suitable for my use cases as they require environment's dynamics, which are given by a set of probabilities. (Sutton Chapter 4 Dynamic Programming, first paragraph).

Monte Carlo might be more suitable as it learns from experience, which I will be able to generate. But it needs episodes, which some of my use cases won't have.

TD methods don't need complete episodes. So these might be more in line with what I require.

It will be good to have both Monte Carlo and TD methods, to check that both give the same approximate results.

jsphon commented 7 years ago

TD Methods include:

sarsa
q learning
expected sarsa
Importance Sampling

Sarsa:

target value is based on the action value taken.

Q Learning:

target value based on best estimated value (might not be the action that was taken)

Expected Sarsa:

target value is Expected Value of target, based on probabity of each action.