nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.57k stars 332 forks source link

I'm a beginner, and I have a question for PPO_continuous.py #28

Closed GrehXscape closed 4 years ago

GrehXscape commented 4 years ago

Are you using a single Loss Function to update both actor and critic? If so, is there any difference between update seperately and togather? It would really help if you can answer my question.

Good Day!

nikhilbarhate99 commented 4 years ago

In this repo's implementation, it does not matter as the actor and critic networks do not share any weights.

But, if actor and critic are sharing some weights, the training procedure gets unstable as very different kind of losses are flowing in the shared weights (one of actions and other for values).

So then the joint loss with their coefficients as hyperparameters, helps to stabilize the training.

refer #19