wojzaremba / trpo

99 stars 52 forks source link

The necessity of kl_firstfixed #5

Closed namiyao closed 8 years ago

namiyao commented 8 years ago

Hi Wojciech,

    kl = tf.reduce_sum(oldaction_dist * tf.log((oldaction_dist + eps) / (action_dist_n + eps))) / Nf
    # KL divergence where first arg is fixed
    # replace old->tf.stop_gradient from previous kl
    kl_firstfixed = tf.reduce_sum(tf.stop_gradient(
        action_dist_n) * tf.log(tf.stop_gradient(action_dist_n + eps) / (action_dist_n + eps))) / Nf

I think the kl_firstfixed is exactly the same as kl since the feed is

    feed = {self.obs: obs_n,
              self.action: action_n,
              self.advant: advant_n,
              self.oldaction_dist: action_dist_n}

Why not just use kl instead of kl_firstfixed for simplicity as well as saving computation?

wojzaremba commented 8 years ago

Values of kl, and kl_firstfixed are the same, but derivatives are not. As far as I remember, this trick had to do with computation of natural gradient.

For instance, d/dx (x^2) is not equal to d/dx (x * stop_gradient(x)) The first one is equal to 2x, and second to x.