Wrong RND implementaion

michaelnny / deep_rl_zoo

A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.

Apache License 2.0

99 stars 8 forks source link

Wrong RND implementaion #12

Closed michaelnny closed 1 year ago

michaelnny commented 1 year ago

RND only takes in a single frame instead of stacked frames
The intrinsic reward should be the squared distance between the predictor and target
openAI uses forward filter to compute returns and then normalize intrinsic reward

https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/ppo_agent.py#L257