Open jsphon opened 7 years ago
There are different versions of these algorithms. e.g.
The Dynamic Programming methods might not be suitable for my use cases as they require environment's dynamics, which are given by a set of probabilities. (Sutton Chapter 4 Dynamic Programming, first paragraph).
Monte Carlo might be more suitable as it learns from experience, which I will be able to generate. But it needs episodes, which some of my use cases won't have.
TD methods don't need complete episodes. So these might be more in line with what I require.
It will be good to have both Monte Carlo and TD methods, to check that both give the same approximate results.
TD Methods include:
Sarsa:
Q Learning:
Expected Sarsa:
Policy Evaluation Policy Improvement Policy Iteration
Value Iteration