Training plot for boxing giving values much greater than Evaluation

hengyuan-hu / rainbow

A PyTorch implementation of Rainbow DQN agent

164 stars 23 forks source link

Training plot for boxing giving values much greater than Evaluation #3

Open Ashutosh-Adhikari opened 6 years ago

Ashutosh-Adhikari commented 6 years ago

While training, the scale of summed clipped rewards that an agent gets is much higher than what it gets for boxing and much lower for games like qbert and spaceinvaders. Any idea regarding this?

Ashutosh-Adhikari commented 6 years ago

Hi, is it because of clipping the rewards?

hengyuan-hu commented 6 years ago

I am not sure whether I get your question correctly. Different games have different rewarding mechanism. Some games have dense reward signal while some games have sparse reward signal (for example in space invaders the agent only get reward when it hit an enemy).