Closed turmeric-blend closed 3 years ago
The goal of this layer is to "get rid" of the lookback
dimension in some hopefully meaningful way. So what comes out of this layer is not a "timeseries" anymore. Yes, it is inspired by the Attention mechanism. The biggest advantage over the other collapse layers is that it has learnable parameters. And the intuition is that we take all time steps (let's say previous 30 days) and we just compute their weighted average. The weights of this weighted average are going to be (in general) different for each asset.
Hope it makes sense:)
I see. I guess in that sense it could be extended to any dimension to collapse on depending on what we are trying to learn. Ok thanks :)
Hi, may I ask what is the intuition behind the
AttentionCollapse
layer? It seems "superior" to the other collapse layers.My guess is that by using attention collapse (instead of sum/average/max... operations etc), is similar to how using attention in transformers for natural language obviously learns better than if say, if these attention in transformers were to be sum/average/max... instead ?