One question about the reinforcement model

nikitasrivatsan / DeepLearningVideoGames

1.08k stars 215 forks source link

Open keviny opened 8 years ago

keviny commented 8 years ago

Why you leave the GAMMA * np.max(readout_j1_batch[i]) out side of the model?

Why not leave this part of the formulator into the model like GAMMA * tf.reduce_max(next_q_results, reduction_indices=1)