I dont think PPO pendulum is converging

sweetice / Deep-reinforcement-learning-with-pytorch

PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....

MIT License

3.88k stars 844 forks source link

Open Bigpig4396 opened 5 years ago

KT27-A commented 5 years ago

Yes, the problem is that the activation function is chosen incorrectly.

HuangHaoyu1997 commented 4 years ago

I don't think this repo implement the PPO correctly either

NanJuni commented 4 years ago

change the activation function relu to tanh

wiluen commented 3 years ago

right,change relu to tanh in actor network