wojzaremba / trpo

99 stars 52 forks source link

KL divergence always bigger than constraint #2

Closed kvfrans closed 8 years ago

kvfrans commented 8 years ago

I'm trying to reproduce results on Copy-v0.

surrafter, kloldnew, entropy = self.session.run(
    self.losses, feed_dict=feed)
if kloldnew > 2.0 * config.max_kl:
    self.sff(thprev)

The if statement here is always being called, and the KL between old and new is always greater than 0.01 (max_kl). So no changes are being made to the policy.

********** Iteration 1 ************
Total number of episodes:                 784
KL between old and new distribution:      0.0506147 (this is greater than 2 * 0.01)
Entropy:                                  2.912
Surrogate loss:                           -0.210527
Average sum of rewards per episode:       -0.309113300493
Baseline explained:                       -0.0618615207653
Time elapsed:                             0.07 mins

I am running the script by python main.py Copy-v0

wojzaremba commented 8 years ago

try out changing max_kl then.

Someone, wrote in the previous post at github that current code doesn't do line search. This could help as well.

zhongwen commented 8 years ago

Hi, you may find https://github.com/wojzaremba/trpo/pull/3 helpful.

wojzaremba commented 8 years ago

Merged.