nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

How to improve the performance based on your code? #48

Closed 4thfever closed 2 years ago

4thfever commented 2 years ago

Hi

Thanks for your code sharing and your nice work!

I trained the agent on CartPole task and it works well. However, my agent and your agent are not perfect yet (sometimes the reward is less than 400). I guess this comes from your code's minimalism.

Episode: 1 Reward: 400.0 Episode: 2 Reward: 400.0 Episode: 3 Reward: 400.0 Episode: 4 Reward: 126.0 Episode: 5 Reward: 400.0

I am wondering what is the recommended way or trick to improve my agent's performance based on your code?

Thanks a lot

nikhilbarhate99 commented 2 years ago

Since the environment and policy are stochastic, it cannot be guaranteed that the agent will be successful in every episode.

To further improve, you can try to train it for longer time and decreasing the min_action_std, so it is less random at the end.