nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.63k stars 340 forks source link

update() function to minimize, rather than maximize? #7

Closed BigBadBurrow closed 4 years ago

BigBadBurrow commented 4 years ago

Hello, thank you for such a clear example of PPO in PyTorch. I wonder if you might know how the update() method might be modified to minimize rather than maximize? In my case I want to minimize a regret factor, rather than maximize a reward. Many thanks.

Another question; why use Tanh() activation instead of ReLU()?

nikhilbarhate99 commented 4 years ago

Hey, I would suggest you to store the regrets as negative rewards, i.e. while appending the rewards, append -regret. Or in the update() function do rewards = - rewards.

The choice of activation function depends on the environment, I have found that Tanh performs slightly better than ReLU and most other PPO implementations also use Tanh.

BigBadBurrow commented 4 years ago

Sorry, yes of course. I think I need more sleep ha ha