kengz / SLM-Lab

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
https://slm-lab.gitbook.io/slm-lab/
MIT License
1.23k stars 263 forks source link

how to improve the convergence performance of training loss? #510

Open williamyuanv0 opened 2 years ago

williamyuanv0 commented 2 years ago

Hi kengz, I find that the convergence performance of training loss (=value loss+policy loss) of ppo algorithem applied in game pong is poor (see Fig.1), but the corresponding mean_returns shows a good upward trend and reaches convergence (see Fig.2). That is why? how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked. ppo_pong_t0_s0_session_graph_eval_loss_vs_frame Fig.1 ppo_pong_t0_s0_session_graph_eval_mean_returns_vs_frames Fig.2