questions about self.attn_drop and loose reconstruction.

LinearAttention2 with the reduced form of Q(KV) can not support attn_drop because there is no explict attention map. Attn_drop drops the elements of attention map QK, which is avaliable in LinearAttention in (QK)V form. In addition, I think attn_drop is not really needed in linear attention that already do not focus.

We do not reconstruct attention maps (QK in Attention modules); we reconstruct features. The idea of loose reconstruction is not about what is input to the decoder. It is about the reconstruction subject (decoder features) and object (encoder features). Instead of let each decoder layer to reconstruct each encoder layer as convention, we proposed to average the decoder layers to reconstruct the average of encoder layers. And there is no fuse layer (the .fuse fuction in the code is simply average).

E.g., Previous reconstruction paradigm: D_1==>E_1 D_2==>E_2 D_3==>E_3 D_4==>E_4

Ours: （D_1+D_2+D_3+D_4) ==> (E_1+E_2+E_3+E_4)

guojiajeremy / Dinomaly

questions about self.attn_drop and loose reconstruction. #9