wisnunugroho21 / reinforcement_learning_ppo_rnd

Deep Reinforcement Learning by using Proximal Policy Optimization and Random Network Distillation in Tensorflow 2 and Pytorch with some explanation
GNU General Public License v3.0
47 stars 5 forks source link

Entropy calculation not useful #8

Open Myrkiriad-coder opened 3 years ago

Myrkiriad-coder commented 3 years ago

Describe the bug In ppo_continous_tensorflow.py, when you calculate entropy with: dist_entropy = tf.math.reduce_mean(self.distributions.entropy(action_mean, self.std)) since entropy only depends on std and std is a static parameter, dist_entropy has always the same value all the time. Thus, entropy loss has no effect on learning.

To Reproduce Launch any env and stop your debugger on dist_entropy. Check that it has the same value for every batch at any given point during learning.

Expected behavior Std shall not be static but somehow represent real prediction confidence of the network.

wisnunugroho21 commented 3 years ago

Sorry if I'm so late to reply. Thank you for the advice. Actually, you can set the entropy coefficient to 0 if you use static parameters.

I really think using a neural network to calculate the std is much better than using static parameters. I forgot to do it in this repository