nagataka / Read-a-Paper

Survey
6 stars 1 forks source link

Unifying Count-Based Exploration and Intrinsic Motivation #23

Open nagataka opened 5 years ago

nagataka commented 5 years ago

Summary

Link

https://arxiv.org/abs/1606.01868

Author/Institution

Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos DeepMind

What is this

Proposed 'pseudo-count' approach to have an agent explore unknown environment.

Comparison with previous researches. What are the novelties/good points?

Advantage compared to count-based approach:

Key points

Solve a following equation:

image

where image This is the probability assigned to x by the density model after observing a new occurrence of x. Density model is Context Tree Switching (Bellemare et al 2014)

image is a pseudo-count function and image is a pseudo-count total.

Add a bonus which is

image

How the author proved effectiveness of the proposal?

Atari 2600 games using Arcade Learning Environment (ALE). Especially, Montezuma's revenge showed novel result.

Any discussions?

Quote from the paper:

Induced metric

We did not address the question of where the generalization comes from. Clearly, the choice of density model induces a particular metric over the state space. A better understanding of this metric should allow us to tailor the density model to the problem of exploration.

Compatible value function

There may be a mismatch in the learning rates of the density model and the value function: DQN learns much more slowly than our CTS model. As such, it should be beneficial to design value functions compatible with density models (or vice-versa).

The continuous case

Although we focused here on countable state spaces, we can as easily define a pseudo-count in terms of probability density functions. At present it is unclear whether this provides us

What should I read next?

Exploration by Random Network Distillation

nagataka commented 5 years ago

https://github.com/brendanator/atari-rl https://github.com/mgbellemare/SkipCTS