Intuition behind AttentionCollapse

turmeric-blend commented 3 years ago

Hi, may I ask what is the intuition behind the AttentionCollapse layer? It seems "superior" to the other collapse layers.

My guess is that by using attention collapse (instead of sum/average/max... operations etc), is similar to how using attention in transformers for natural language obviously learns better than if say, if these attention in transformers were to be sum/average/max... instead ?

jankrepl commented 3 years ago

The goal of this layer is to "get rid" of the lookback dimension in some hopefully meaningful way. So what comes out of this layer is not a "timeseries" anymore. Yes, it is inspired by the Attention mechanism. The biggest advantage over the other collapse layers is that it has learnable parameters. And the intuition is that we take all time steps (let's say previous 30 days) and we just compute their weighted average. The weights of this weighted average are going to be (in general) different for each asset.

Hope it makes sense:)

turmeric-blend commented 3 years ago

I see. I guess in that sense it could be extended to any dimension to collapse on depending on what we are trying to learn. Ok thanks :)

jankrepl / deepdow

Intuition behind AttentionCollapse #111