openai / universe-starter-agent

A starter agent that can solve a number of universe environments.
MIT License
1.1k stars 318 forks source link

Change sign of entropy penalty for punishment #110

Closed 4SkyNet closed 7 years ago

4SkyNet commented 7 years ago

Entropy should be added to overall loss with + sign, since we want add it as penalty to become more steepest in our probability distribution within actions.

entropy = - tf.reduce_sum(prob_tf * log_prob_tf)

log probabilities above should has negative value in the brackets and we also add one more - before sum to have some positive value in total.

So,

self.loss = pi_loss + 0.5 * vf_loss - entropy * 0.01

when we sum up all of penalties we also should add entropy as penalty.

The right formula should like as follows to my mind:

self.loss = pi_loss + 0.5 * vf_loss + entropy * 0.01