leo-p / papers

Papers and their summary (in issue)
22 stars 4 forks source link

Neural Episodic Control #21

Open leo-p opened 7 years ago

leo-p commented 7 years ago

https://arxiv.org/pdf/1703.01988.pdf

Deep reinforcement learning methods attain super-human performance in a wide range of environments. Such methods are grossly inefficient, often taking orders of magnitudes more data than humans to achieve reasonable performance. We propose Neural Episodic Control: a deep reinforcement learning agent that is able to rapidly assimilate new experiences and act upon them. Our agent uses a semi-tabular representation of the value function: a buffer of past experience containing slowly changing state representations and rapidly updated estimates of the value function. We show across a wide range of environments that our agent learns significantly faster than other state-of-the-art, general purpose deep reinforcement learning agents.

leo-p commented 7 years ago

Resume:

They introduce a new network element, called DND (Differentiable Neural Dictionary) which is basically a dictionary that uses any key (especially embeddings) and computes the value by using kernel between keys. Plus it's differentiable.

Architecture:

They use basically a network in two steps:

  1. A classical CNN network that computes and embedding for every image.
  2. A DND for all possible actions (controller input) that stores the embedding as key and estimated reward as value.

Also they use a buffer to store all tuples (previous image, action, reward, next image) and for training basic technique is used.

screen shot 2017-04-12 at 11 23 32 am

Results:

Clearly improves learning speed but in the end other techniques catchup and it gets outperformed.