Closed Ivanopolo closed 9 years ago
I've noticed inconsistency of how mean_q is evaluated compared to Deepmind code.
According to these lines: https://github.com/soumith/deepmind-atari/blob/master/dqn/NeuralQLearner.lua#L204 https://github.com/soumith/deepmind-atari/blob/master/dqn/NeuralQLearner.lua#L295 Deepmind code v_avg is calculating average over maximum of Q values, while deep_q_rl implementation calculates average over mean of Q values. If it wasn't the intention, we should correct it in order to get consistent action values with what we can see in the Nature paper.
I've noticed inconsistency of how mean_q is evaluated compared to Deepmind code.
According to these lines: https://github.com/soumith/deepmind-atari/blob/master/dqn/NeuralQLearner.lua#L204 https://github.com/soumith/deepmind-atari/blob/master/dqn/NeuralQLearner.lua#L295 Deepmind code v_avg is calculating average over maximum of Q values, while deep_q_rl implementation calculates average over mean of Q values. If it wasn't the intention, we should correct it in order to get consistent action values with what we can see in the Nature paper.