is a pseudo-count function and
is a pseudo-count total.
Add a bonus which is
How the author proved effectiveness of the proposal?
Atari 2600 games using Arcade Learning Environment (ALE). Especially, Montezuma's revenge showed novel result.
Any discussions?
Quote from the paper:
Induced metric
We did not address the question of where the generalization comes from. Clearly, the choice of density model induces a particular metric over the state space. A better understanding of this metric should allow us to tailor the density model to the problem of exploration.
Compatible value function
There may be a mismatch in the learning rates of the density model and the value function: DQN learns much more slowly than our CTS model. As such, it should be beneficial to design value functions compatible with density models (or vice-versa).
The continuous case
Although we focused here on countable state spaces, we can as easily define a pseudo-count in terms of probability density functions. At present it is unclear whether this provides us
Summary
Link
https://arxiv.org/abs/1606.01868
Author/Institution
Marc G. Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos DeepMind
What is this
Proposed 'pseudo-count' approach to have an agent explore unknown environment.
Comparison with previous researches. What are the novelties/good points?
Advantage compared to count-based approach:
Key points
Solve a following equation:
where This is the probability assigned to x by the density model after observing a new occurrence of x. Density model is Context Tree Switching (Bellemare et al 2014)
is a pseudo-count function and is a pseudo-count total.
Add a bonus which is
How the author proved effectiveness of the proposal?
Atari 2600 games using Arcade Learning Environment (ALE). Especially, Montezuma's revenge showed novel result.
Any discussions?
Quote from the paper:
Induced metric
What should I read next?
Exploration by Random Network Distillation