reinforcement-learning-kr / pg_travel

Policy Gradient algorithms (REINFORCE, NPG, TRPO, PPO)
MIT License
368 stars 76 forks source link

why log standard deviation is fixed to 0 #18

Open dlrudco opened 3 years ago

dlrudco commented 3 years ago

I see that in the actor critic model(model.py) it outputs the mu and logstd as an output. In the code, logstd is fixed to 0 by defining it "logstd = torch.zeros_like(mu)" making the standard deviation fixed to 1. But as far as I know it should return the logstd which is also learned by the network(in this case logstd would be the output of some layer). Is there any reason for this behavior?