Open joyousrabbit opened 7 years ago
Hello, in game_ac_network.py, def prepare_loss(self, entropy_beta), you have:
# temporary difference (R-V) (input for policy) self.td = tf.placeholder("float", [None]) value_loss = 0.5 * tf.nn.l2_loss(self.r - self.v)
But td == self.r-self.v, right?
So, why not use self.td directly instead of recalculating self.v ? Also for pi, why not pass it as placeholder?
Hope reply thanks.
Because self.td is a fed in number(s) used in the policy gradient. You use self.r - self.v to calculate critic losses.
Hello, in game_ac_network.py, def prepare_loss(self, entropy_beta), you have:
But td == self.r-self.v, right?
So, why not use self.td directly instead of recalculating self.v ? Also for pi, why not pass it as placeholder?
Hope reply thanks.