nikhilbarhate99 / PPO-PyTorch

Minimal implementation of clipped objective Proximal Policy Optimization (PPO) in PyTorch
MIT License
1.63k stars 340 forks source link

Shared parameters for NN action_layer and NN value_layer #19

Closed ArnoudWellens closed 4 years ago

ArnoudWellens commented 4 years ago

Dear nikhilbarhate99,

First of all, thanks for sharing your work on Github.

About your code, I have a question in terms of the two neural networks, namely, self.action_layer and self.value_layer. To my understanding, those two NN are completely seperated, but are updated using the same loss function. Wouldn't it make more sense to share the parameters of the first two layers to increase training speed. Or am I missing something important?

Looking forward to your response.

nikhilbarhate99 commented 4 years ago

Sharing parameters can lead to high variance due to very different gradients flowing through the network (one loss is for mapping state to action and other loss is for mapping state to rewards/values), hence it can sometimes take more time to train even with less number of parameters or sometimes the network will not learn anything useful at all.

Contrarily, sometimes sharing parameters of initial layers could be useful for training deeper networks because they can learn some common internal representation of the state.

As of now, there is no standard way to implement RL algorithms or to select a particular network architecture. This is mostly a hyperparameter choice, and you will need to figure out what works best for your problem by tuning it.

Since the separate configuration worked out in this case, I did not test it for the shared parameters.

ArnoudWellens commented 4 years ago

Thank you for answering this quickly and for your insights.

As a final question, if both networks are fully independant, why did you combine the loss function? If one loss function is for mapping state to rewards/values, we could only give it to the NN value_layer? And the other loss function to the action_layer?

Thanks again for your time.

nikhilbarhate99 commented 4 years ago

Again, it is not the only way you can do this, but that is how it is defined in the paper, so I went with it.