Closed seungjaeryanlee closed 5 years ago
They introduced the new loss in the implementation of PPO2: https://github.com/openai/baselines/blob/master/baselines/ppo2/model.py#L63
Also see grad normalization here: https://github.com/openai/baselines/blob/master/baselines/ppo2/model.py#L102
Thank you for the links! I see how they correspond to those parts of PPO2 in OpenAI Baselines.
It's unfortunate that these changes are not written in any paper. Guess I will have to read openai/baselines code as well.
Hello! I was documenting your PPO code
algo/ppo.py
to improve my understanding of the algorithm, and I got confused onmax_grad_norm
and_use_clipped_value_loss
.If I am understanding this correctly,
max_grad_norm
is given tonn.utils.clip_grad_norm_()
to set maximum gradient size, and_use_clipped_value_loss
. However, I could not find relevent detail in the paper Proximal Policy Optimization Algorithm. If it was explicitly mentioned here, would you please point it out for me?For L^VF, The paper seems to use the simple squared loss, equivalent to
use_clipped_value_loss=False
, but I could not find anything about the case whenuse_clipped_value_loss=True
. Is this a trick not mentioned in the paper?Thank you in advance for your help. Happy holidays!