The necessity of kl_firstfixed

Hi Wojciech,

    kl = tf.reduce_sum(oldaction_dist * tf.log((oldaction_dist + eps) / (action_dist_n + eps))) / Nf
    # KL divergence where first arg is fixed
    # replace old->tf.stop_gradient from previous kl
    kl_firstfixed = tf.reduce_sum(tf.stop_gradient(
        action_dist_n) * tf.log(tf.stop_gradient(action_dist_n + eps) / (action_dist_n + eps))) / Nf

I think the kl_firstfixed is exactly the same as kl since the feed is

    feed = {self.obs: obs_n,
              self.action: action_n,
              self.advant: advant_n,
              self.oldaction_dist: action_dist_n}

Why not just use kl instead of kl_firstfixed for simplicity as well as saving computation?

wojzaremba / trpo

The necessity of kl_firstfixed #5