sawcordwell / pymdptoolbox

Markov Decision Process (MDP) Toolbox for Python
BSD 3-Clause "New" or "Revised" License
518 stars 252 forks source link

Stopping Criteria || pymdptoolbox || boundIter() function #33

Open 3shmawei opened 4 years ago

3shmawei commented 4 years ago

I was trying to understand the source code of boundIter() function which computes a bound for the number of iterations as per Proposition 6.6.5 in “Markov Decision Processes”, M. L. Puterman, 1994:

image

Pymdptoolbox: https://github.com/sawcordwell/pymdptoolbox/blob/master/src/mdptoolbox/mdp.py

image

Although the Hajnal measure (k) is defined by Theorem 6.6.6, and its implementation code is clear, I can’t understand the overall meaning behind the formula:

image

It’s very difficult for me to relate the equation used to calculate maximum iteration to the formula used in the code:

max_iter = (log((epsilon (1 - discount) / discount) / span) / log(discount k))

Can you please explain that formula to me, so that I can understand why the logarithm is used and how Hajnal measure (k) can be used to calculate the maximum iteration?

Your Support is much appreciated!