adding optimal policy calculation in the value iteration algorithm

ivan-v-kush commented 7 years ago

you could add an optimal policy evaluation after generate_graph in the value iteration algorithm

https://mpatacchiola.github.io/blog/2016/12/09/dissecting-reinforcement-learning.html

    generate_graph(graph_list)

#optimal policy evaluation
    pi = np.zeros(12)
    for s in range(tot_states):
        v = np.zeros(tot_states)
        v[s] = 1.0
        pi[s] = return_expected_action(v, T, u)
    pi[5] = np.NaN
    pi[3] = pi[7] = -1
    print(pi)

def return_expected_action(u, T, v):
    actions_array = np.zeros(4)
    for action in range(4):
         #Expected utility of doing a in state s, according to T and u.
         actions_array[action] = np.sum(np.multiply(u, np.dot(v, T[:,:,action])))
    return np.argmax(actions_array)

mpatacchiola commented 7 years ago

Hi @IvanKush

If you want you can send a pull request to integrate this modification. I will be happy to integrate it in the master branch.

Massimiliano

mpatacchiola commented 5 years ago

Closed for inactivity

mpatacchiola / dissecting-reinforcement-learning

adding optimal policy calculation in the value iteration algorithm #3