jsphon / reinforcement_learning

Python Package For Reinforcement Learning
0 stars 0 forks source link

Implement Policyxxx #12

Open jsphon opened 7 years ago

jsphon commented 7 years ago

Policy Evaluation Policy Improvement Policy Iteration

Value Iteration

jsphon commented 7 years ago

There are different versions of these algorithms. e.g.

The Dynamic Programming methods might not be suitable for my use cases as they require environment's dynamics, which are given by a set of probabilities. (Sutton Chapter 4 Dynamic Programming, first paragraph).

Monte Carlo might be more suitable as it learns from experience, which I will be able to generate. But it needs episodes, which some of my use cases won't have.

TD methods don't need complete episodes. So these might be more in line with what I require.

It will be good to have both Monte Carlo and TD methods, to check that both give the same approximate results.

jsphon commented 7 years ago

TD Methods include:

Sarsa:

Q Learning:

Expected Sarsa: