Possible mistake in Deep Q Learning Space Invaders notebook

simoninithomas / Deep_reinforcement_learning_Course

Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch

http://www.simoninithomas.com/deep-rl-course

3.74k stars 1.23k forks source link

Possible mistake in Deep Q Learning Space Invaders notebook #51

Open karolisjan opened 5 years ago

karolisjan commented 5 years ago

Hey. Shouldn't self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_)) in DQN class be self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1), i.e. reduced along columns so that the output length of self.Q is equal to the batch size? If not then self.Q will be a scalar while self.target_Q will be a vector of batch size length.

ali-ehsan commented 5 years ago

@karolisjan I agree.