Hi kengz, I find that the convergence performance of training loss (=value loss+policy loss) of ppo algorithem applied in game pong is poor (see Fig.1), but the corresponding mean_returns shows a good upward trend and reaches convergence (see Fig.2).
That is why? how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked.
Fig.1
Fig.2
Hi kengz, I find that the convergence performance of training loss (=value loss+policy loss) of
Fig.1
Fig.2
ppo
algorithem applied in gamepong
is poor (see Fig.1), but the correspondingmean_returns
shows a good upward trend and reaches convergence (see Fig.2). That is why? how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked.