Open Meur-sault opened 5 years ago
Hi Thomas,
(Since this issue got resolved without any proper answer, I'm submitting it again.) I don't understand that why we are doing tf.reduce_sum and multiple the network output to action.
self.Q = tf.reducesum(tf.multiply(self.output, self.actions), axis=1)
Why aren't we considering self.output as predicted Q value.
Hi Thomas,
(Since this issue got resolved without any proper answer, I'm submitting it again.) I don't understand that why we are doing tf.reduce_sum and multiple the network output to action.
self.Q = tf.reducesum(tf.multiply(self.output, self.actions), axis=1)
Why aren't we considering self.output as predicted Q value.