Open seungjaeryanlee opened 5 years ago
As shown in the figure above, the value estimation loss explodes. This is shown both in PPO (shown above) and RND, so I am assuming that it might be due to how I am handling Atari environment.
As shown in the figure above, the value estimation loss explodes. This is shown both in PPO (shown above) and RND, so I am assuming that it might be due to how I am handling Atari environment.