michaelnny / deep_rl_zoo

A collection of Deep Reinforcement Learning algorithms implemented with PyTorch to solve Atari games and classic control tasks like CartPole, LunarLander, and MountainCar.
Apache License 2.0
99 stars 8 forks source link

Wrong RND implementaion #12

Closed michaelnny closed 1 year ago

michaelnny commented 1 year ago
  1. RND only takes in a single frame instead of stacked frames
  2. The intrinsic reward should be the squared distance between the predictor and target
  3. openAI uses forward filter to compute returns and then normalize intrinsic reward

https://github.com/openai/random-network-distillation/blob/f75c0f1efa473d5109d487062fd8ed49ddce6634/ppo_agent.py#L257