qqiang00 / Reinforce

Reinforcement Learning Algorithm Package & PuckWorld, GridWorld Gym environments
842 stars 485 forks source link

main-RL-QiangYe.pdf中2.4.2验证贝尔曼方程的代码compute_q #4

Open weimin0812 opened 5 years ago

weimin0812 commented 5 years ago

main-RL-QiangYe.pdf中2.4.2验证贝尔曼方程的代码compute_q函数中的for循环体不应包括第二句 正确写法:

  def compute_q(MDP, V, s, a):
    S, A, R, P, gamma = MDP
    q_sa = 0
    for s_prime in S:
           q_sa += get_prob(P, s, a, s_prime) * get_value(V, s_prime)
    q_sa = get_reward(R, s, a) + gamma * q_sa
    return q_sa

pdf中写法:

  def compute_q(MDP, V, s, a):
    S, A, R, P, gamma = MDP
    q_sa = 0
    for s_prime in S:
        q_sa += get_prob(P, s, a, s_prime) * get_value(V, s_prime)
        q_sa = get_reward(R, s, a) + gamma * q_sa
    return q_sa