spragunr / deep_q_rl

Theano-based implementation of Deep Q-learning
BSD 3-Clause "New" or "Revised" License
1.08k stars 348 forks source link

Small bug fix for mean_q #39

Closed Ivanopolo closed 9 years ago

Ivanopolo commented 9 years ago

I've noticed inconsistency of how mean_q is evaluated compared to Deepmind code.

According to these lines: https://github.com/soumith/deepmind-atari/blob/master/dqn/NeuralQLearner.lua#L204 https://github.com/soumith/deepmind-atari/blob/master/dqn/NeuralQLearner.lua#L295 Deepmind code v_avg is calculating average over maximum of Q values, while deep_q_rl implementation calculates average over mean of Q values. If it wasn't the intention, we should correct it in order to get consistent action values with what we can see in the Nature paper.