mengf1 / PAL

Policy based Active Learning with DQN (EMNLP-2017)
https://bit.ly/2rasQVM
88 stars 34 forks source link

issue in q-learning #1

Closed nitahhhh closed 7 years ago

nitahhhh commented 7 years ago

Hi, In the paper, as training Q network, the expected Q_value (y) should be reward + r * Q_value_next_state. However, in this code, it seems to be Q_value_this_state. Is there something wrong as code released? Thank you.

mengf1 commented 7 years ago

Code about this part is correct. For training Q-network, we need to calculate two values: q_target and q_value. q_value is calculated by a neural network approximator. q_target is an expected Q-value from the Bellman equation: reward + GAMMA * max (Q_value_new_state).

nitahhhh commented 7 years ago

Yes, I agree. But you put 'Q_value_this_state' (instead of 'Q_value_new_state') in q_target formula in your code. Thanks for your reply.

mengf1 commented 7 years ago

In the code, I did not do that. If you go to line 106-107, you can understand how to calculate q_target based on the Bellman equation.

nitahhhh commented 7 years ago

You don't use new observations in variable 'nextState_batch' after line 87 for calculating q_target. I think the new observation should be used for calculating Q_value_new_state.

mengf1 commented 7 years ago

I fix this. I made a mistake when I moved my original code here. Thanks.

nitahhhh commented 7 years ago

Thanks for the prompt reply. I close the issue.