Neural Episodic Control

https://arxiv.org/pdf/1703.01988.pdf

Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

Resume:

Dataset: Atari games.
Objective: Reduce learning time for DQN-type architectures.

They introduce a new network element, called DND (Differentiable Neural Dictionary) which is basically a dictionary that uses any key (especially embeddings) and computes the value by using kernel between keys. Plus it's differentiable.

Architecture:

They use basically a network in two steps:

A classical CNN network that computes and embedding for every image.
A DND for all possible actions (controller input) that stores the embedding as key and estimated reward as value.

Also they use a buffer to store all tuples (previous image, action, reward, next image) and for training basic technique is used.

Results:

Clearly improves learning speed but in the end other techniques catchup and it gets outperformed.

leo-p / papers