Hi, thanks for the great repo
I have a question,
In the function masked_kl_div of ppo.py, shouldnt the calculation be prob1*(log(prob1) - log(prob2))?
The calculation in the code is a negative KL loss that is to be maximized instead of minimized (as assumed by the code).
Hi, thanks for the great repo I have a question, In the function
masked_kl_div
ofppo.py
, shouldnt the calculation beprob1*(log(prob1) - log(prob2))
? The calculation in the code is a negative KL loss that is to be maximized instead of minimized (as assumed by the code).