Entropy should be added to overall loss with + sign,
since we want add it as penalty to become more steepest in our probability distribution within actions.
entropy = - tf.reduce_sum(prob_tf * log_prob_tf)
log probabilities above should has negative value in the brackets and we also add one more - before sum to have some positive value in total.
Entropy
should be added to overallloss
with+
sign, since we want add it as penalty to become more steepest in our probability distribution within actions.log
probabilities above should hasnegative
value in the brackets and we also add one more-
beforesum
to have some positive value in total.So,
when we sum up all of
penalties
we also should addentropy
as penalty.The right formula should like as follows to my mind: