A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.
The wrapper which collect 'raw_reward' in the info dict should be applied after frame skip and frame stack. As current code is applied before frame skip and frame stack, thus it will re-count the same reward multiple times.
The issue could be reproduced by running any agent on the Atari Pong game. The expected episode returns should be around -20 or -21 when starting out, however with current code, we get some random values for the episode return.
The wrapper which collect 'raw_reward' in the info dict should be applied after frame skip and frame stack. As current code is applied before frame skip and frame stack, thus it will re-count the same reward multiple times.
The issue could be reproduced by running any agent on the Atari Pong game. The expected episode returns should be around -20 or -21 when starting out, however with current code, we get some random values for the episode return.