Open xysun opened 5 years ago
achieves better than human in Montezuma's Revenge, with no demonstration or access to underlying state
Previous work:
Random Network Distillation to avoid noisy TV problem:
Implementation
Need to understand more on:
theoretically how does RND avoids noisy TV
The intuition is that predictive models have low error in states similar to the ones they have been trained on. In particular the agent’s predictions of the output of a randomly initialized neural network will be less accurate in novel states than in states the agent visited frequently. The advantage of using a synthetic prediction problem is that we can have it be deterministic (bypassing Factor 2) and inside the class of functions the predictor can represent (bypassing Factor 3) by choosing the predictor to be of the same architecture as the target network. These choices make RND immune to the noisy-TV problem.
follow up readings:
things to try
Train a curious agent on many different environments without reward and investigate the transfer to target environments with rewards.
Goal:
Proposed approach: