Entropy calculation not useful

wisnunugroho21 / reinforcement_learning_ppo_rnd

Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch with some explanation

GNU General Public License v3.0

47 stars 5 forks source link

Describe the bug In ppo_continous_tensorflow.py, when you calculate entropy with: dist_entropy = tf.math.reduce_mean(self.distributions.entropy(action_mean, self.std)) since entropy only depends on std and std is a static parameter, dist_entropy has always the same value all the time. Thus, entropy loss has no effect on learning.

To Reproduce Launch any env and stop your debugger on dist_entropy. Check that it has the same value for every batch at any given point during learning.

Expected behavior Std shall not be static but somehow represent real prediction confidence of the network.

wisnunugroho21 / reinforcement_learning_ppo_rnd

Entropy calculation not useful #8