nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

Performance of PPO on other projects #36

Closed pengzhi1998 closed 2 years ago

pengzhi1998 commented 3 years ago

Hi, I'm using your great implementation of PPO (discrete) on another project of robot's obstacle avoidance.

Actually, I was using DDDQN to train the robot's motion. The training was successful. Then, I used the same network and reward function from DDDQN implementation for this PPO implementation. And I tried several different sets of hyper-parameters (to change the values of lr, update_timestep and k_epochs and etc). Nevertheless, none of them work for it.

Actually, the robot seems to have learnt nothing after even 10 hours of training. And the reward remains very low. Do you know what kind of problem it could be? Will this be the problem of hyper-parameters, network structures or just some kinds of logical error? And will python2 be a problem? (but actually, python2 works for this implementation on Cartpole)

Really look forward to your reply! And thank you again for your PPO implementation!

AlpoGIT commented 3 years ago

Hi, I'm not an expert, but I implemented PPO from scratch (so it's messy) on another project (see my humble repo). My conclusion was that PPO won't work alone, it needs all the standard tricks.

pengzhi1998 commented 3 years ago

Thank you! @AlpoGIT I'll take a look :)

nikhilbarhate99 commented 2 years ago

You can check the April Update which is a bit more stable. Better Advantage estimate like GAE should also stabilize the training.