Open LpLegend opened 3 years ago
I have changed the activate function from relu to tanh, but there is nothing improvement.
I don't think this code can solve the problem(pendulum), and another question is why this reward is 'running_reward 0.9 + score 0.1'
我也遇到这个问题,我咨询elegantrl作者,他说先tahn,再通过torch.distribution来sample action会影响信息熵,所以是没有办法收敛的,但是我不喜欢elegantrl的ppo写法,所以我还在找别人的代码
Have you got the right code yet? Could you copy a link? Very appreciate!!
I don't think this code can solve the problem(pendulum), and another question is why this reward is 'running_reward 0.9 + score 0.1'
You can change clip_param from 0.2 into 0.1, constrainting trust regions. This method can work!
I don't think this code can solve the problem(pendulum), and another question is why this reward is 'running_reward 0.9 + score 0.1'