Can anyone explain Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb

simoninithomas / Deep_reinforcement_learning_Course

Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch

http://www.simoninithomas.com/deep-rl-course

3.74k stars 1.23k forks source link

Can anyone explain Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) in Deep Q learning with Doom.ipynb #65

Open ParmpalGill opened 4 years ago

ParmpalGill commented 4 years ago

why multiply by action and use reduce sum instead of argmax?

yonigottesman commented 4 years ago

I think its because actions is a 1hot vector and there is 1 only in the chosen action, So multiplying will give you a vector of zeros instead of one place which will hold the qvalue. the reduce_sum just gets this number out because all the rest are zeros. What do you think?