Open sovelten opened 8 years ago
Good point. As nobody has responded, what are you using as an alternative, for such model-free learning?
I have same question, what is everybody using as an alternative for model-free learning?
Well, you can either build the transition probabilities into the MDP directly, and then use methods such as value iteration to find policy, or you can build the transition probabilities into a simulator and then have some reinforcement learning agent learn these probabilities from interactions with the simulator. You can find many RL packages on github, but I don't have direct experience with any.
It seems that all the algorithms require that you pass a transition probability table and reward vector, however most of the usefullness of algorithms such as QLearning relies on the fact that it doesn't need these values to estimate policies.
Is this by design? A good update to the library would be to enable model-free learning, because most of the time you don't know the model, you have to simulate it. This would make it much more useful to more people.