Why changed the epsilon in Q-learning and the way to update Q, is this better?

sawcordwell / pymdptoolbox

Markov Decision Process (MDP) Toolbox for Python

BSD 3-Clause "New" or "Revised" License

518 stars 252 forks source link

Open baimengwei opened 4 years ago

baimengwei commented 4 years ago

The epsilon obtained is 1-1 / (log (n + 2)), and the update Q uses 1 / sqrt (n + 2)。 Seems like a good choice, is there any basis for doing so?