Fix linesearch and KL divergence constraint

Fix #1 and #2

Results are on: https://gym.openai.com/evaluations/eval_yumMelmmSNWeM4RexXZX5g#reproducibility

Example logs:

********** Iteration 2 ************
Total number of episodes:                 1166
KL between old and new distribution:      0.00500573
Entropy:                                  2.94753
Surrogate loss:                           -0.0609315
Average sum of rewards per episode:       -0.198684210526
Baseline explained:                       -0.0523932738426
Time elapsed:                             0.20 mins
Rollout

********** Iteration 3 ************
Total number of episodes:                 1549
KL between old and new distribution:      0.00585226
Entropy:                                  2.92717
Surrogate loss:                           -0.0543403
Average sum of rewards per episode:       -0.195822454308
Baseline explained:                       -0.0704836218784
Time elapsed:                             0.29 mins
Rollout

wojzaremba / trpo

Fix linesearch and KL divergence constraint #3