sawcordwell / pymdptoolbox

Markov Decision Process (MDP) Toolbox for Python
BSD 3-Clause "New" or "Revised" License
518 stars 253 forks source link

Model-free algorithms depend on model #19

Open sovelten opened 7 years ago

sovelten commented 7 years ago

It seems that all the algorithms require that you pass a transition probability table and reward vector, however most of the usefullness of algorithms such as QLearning relies on the fact that it doesn't need these values to estimate policies.

Is this by design? A good update to the library would be to enable model-free learning, because most of the time you don't know the model, you have to simulate it. This would make it much more useful to more people.

mrebhan commented 6 years ago

Good point. As nobody has responded, what are you using as an alternative, for such model-free learning?

ajaymaity commented 3 years ago

I have same question, what is everybody using as an alternative for model-free learning?

BoZenKhaa commented 3 years ago

Well, you can either build the transition probabilities into the MDP directly, and then use methods such as value iteration to find policy, or you can build the transition probabilities into a simulator and then have some reinforcement learning agent learn these probabilities from interactions with the simulator. You can find many RL packages on github, but I don't have direct experience with any.