typo in policy iteration algorithm on the site

mpatacchiola / dissecting-reinforcement-learning

Python code, PDFs and resources for the series of posts on Reinforcement Learning which I published on my personal blog

https://mpatacchiola.github.io/blog/

MIT License

609 stars 175 forks source link

typo in policy iteration algorithm on the site #2

Closed ivan-v-kush closed 7 years ago

ivan-v-kush commented 7 years ago

https://mpatacchiola.github.io/blog/2016/12/09/dissecting-reinforcement-learning.html

should be (from sources):

def return_expected_action(p, u, T, v):
    actions_array = np.zeros(4)
    for action in range(4):
         #Expected utility of doing a in state s, according to T and u.
         actions_array[action] = np.sum(np.multiply(u, np.dot(v, T[:,:,action])))
    return np.argmax(actions_array)

also we don't need a parameter p

def return_expected_action(u, T, v):

mpatacchiola commented 7 years ago

Fixed, thanks ;-)