Hey. Shouldn't self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_)) in DQN class be self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1), i.e. reduced along columns so that the output length of self.Q is equal to the batch size? If not then self.Q will be a scalar while self.target_Q will be a vector of batch size length.
Hey. Shouldn't
self.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_))
in DQN class beself.Q = tf.reduce_sum(tf.multiply(self.output, self.actions_), axis=1)
, i.e. reduced along columns so that the output length ofself.Q
is equal to the batch size? If not thenself.Q
will be a scalar whileself.target_Q
will be a vector of batch size length.