Closed nitahhhh closed 7 years ago
Code about this part is correct. For training Q-network, we need to calculate two values: q_target and q_value. q_value is calculated by a neural network approximator. q_target is an expected Q-value from the Bellman equation: reward + GAMMA * max (Q_value_new_state).
Yes, I agree. But you put 'Q_value_this_state' (instead of 'Q_value_new_state') in q_target formula in your code. Thanks for your reply.
In the code, I did not do that. If you go to line 106-107, you can understand how to calculate q_target based on the Bellman equation.
You don't use new observations in variable 'nextState_batch' after line 87 for calculating q_target. I think the new observation should be used for calculating Q_value_new_state.
I fix this. I made a mistake when I moved my original code here. Thanks.
Thanks for the prompt reply. I close the issue.
Hi, In the paper, as training Q network, the expected Q_value (y) should be reward + r * Q_value_next_state. However, in this code, it seems to be Q_value_this_state. Is there something wrong as code released? Thank you.