wojzaremba / trpo

99 stars 53 forks source link

About kl_firstfixed #7

Open PeiYingjun opened 6 years ago

PeiYingjun commented 6 years ago

thanks for implementation of trpo, there exist some details that do not make sense to me so far I can't see why kl_firstfixed is defined as following kl_firstfixed = tf.reduce_sum(tf.stop_gradient( action_dist_n) * tf.log(tf.stop_gradient(action_dist_n + eps) / (action_dist_n + eps))) / Nf seems that we didn't make use of anything of oldaction_dist shouldn't it be kl_firstfixed = tf.reduce_sum(tf.stop_gradient( oldaction_dist) * tf.log(tf.stop_gradient(oldaction_dist + eps) / (action_dist_n + eps))) / Nf? besides, why does losses contain the entropy of action_dist_n? why must it be minimized?

PeiYingjun commented 6 years ago

sorry, I mean I think it should be kl_firstfixed = tf.reduce_sum(tf.stop_gradient( oldaction_dist) * tf.log(tf.stop_gradient(oldaction_dist + eps) / (oldaction_dist + eps))) / Nf

PeiYingjun commented 6 years ago

All right, after a quick analysis, I think it' s reasonable to use the first definition of kl_first, yet I'm still confused about the losses, why do we try to minimize three values?