sawcordwell / pymdptoolbox

Markov Decision Process (MDP) Toolbox for Python
BSD 3-Clause "New" or "Revised" License
518 stars 252 forks source link

Why changed the epsilon in Q-learning and the way to update Q, is this better? #34

Open baimengwei opened 4 years ago

baimengwei commented 4 years ago

The epsilon obtained is 1-1 / (log (n + 2)), and the update Q uses 1 / sqrt (n + 2)。 Seems like a good choice, is there any basis for doing so?