Replacing RNN with Self-Attention Mechanism

rabaur commented 3 years ago

Dear David Ha, dear Jürgen Schmidhuber

Thank for this inspirational blog-post. I have stumbled upon your paper while researching for my BSc thesis. It is concerned with training agents to navigate in complex buildings. As you know, navigation is a very complex task where memory is great importance.

Given the complexity of the task and the promising results of self-attention, I was wondering if you have considered exchanging the RNN with self-attention mechanism. I reckon that this would make the memory model more powerful while being computationally less expensive.

Thank you for your considerations, Raphaël Baur, BSc Student ETH Zürich

hardmaru commented 3 years ago

Hi Raphaël,

In later work, I've generally kept the RNN, but replaced the latent space bottleneck with other types of bottlenecks related to self-attention.

For example:

1) Inattentional Blindness bottleneck: https://attentionagent.github.io/

2) Screen shuffling bottleneck: https://attentionneuron.github.io/

Cheers.

rabaur commented 3 years ago

This is very insightful, thank you so much for your answer!

worldmodels / worldmodels.github.io

Replacing RNN with Self-Attention Mechanism #21