Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) ?

simoninithomas / Deep_reinforcement_learning_Course

Implementations from the free course Deep Reinforcement Learning with Tensorflow and PyTorch

http://www.simoninithomas.com/deep-rl-course

3.74k stars 1.23k forks source link

Why we have self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1) ? #61

Open Meur-sault opened 5 years ago

Meur-sault commented 5 years ago

Hi Thomas,

(Since this issue got resolved without any proper answer, I'm submitting it again.) I don't understand that why we are doing tf.reduce_sum and multiple the network output to action.

self.Q = tf.reducesum(tf.multiply(self.output, self.actions), axis=1)

Why aren't we considering self.output as predicted Q value.