Closed ghost closed 5 years ago
Hi,
There is a small mistake in the policy.py file when you calculate the kl divergence between two multivariate normal distributions :
self.kl = 0.5 * tf.reduce_mean(log_det_cov_new - log_det_cov_old + tr_old_new + tf.reduce_sum(tf.square(self.means - self.old_means_ph) / tf.exp(self.log_vars), axis=1) - self.act_dim)
The ratio of the covariances i.e. tr_old_new should be squared in the KL divergence i.e. tr_old_new just needs to be replaced with tr_old_new**2.
tr_old_new
tr_old_new**2
Hi,
There is a small mistake in the policy.py file when you calculate the kl divergence between two multivariate normal distributions :
self.kl = 0.5 * tf.reduce_mean(log_det_cov_new - log_det_cov_old + tr_old_new + tf.reduce_sum(tf.square(self.means - self.old_means_ph) / tf.exp(self.log_vars), axis=1) - self.act_dim)
The ratio of the covariances i.e.
tr_old_new
should be squared in the KL divergence i.e.tr_old_new
just needs to be replaced withtr_old_new**2
.