About Cross Attention - Githubissues

shvdiwnkozbw / Self-supervised-Video-Concept

Code for Static and Dynamic Concepts for Self-supervised Video Representation Learning.

10 stars 1 forks source link

        # attention
        q = q * self.scale  # Normalization.
        attn_logits =  torch.einsum('bnd,bld->bln', q, k)
        attn = self.softmax(attn_logits)
        attn = attn + 1e-8 # to avoid zero when with the L1 norm below
        attn = attn / attn.sum(dim=-2, keepdim=True)

        # update template
        templates = torch.einsum('bld,bln->bnd', v, attn) + templates_prev

Thanks for your contribution. I'm confused about the cross attention in the local transformer. It seems that this softmax is applied to the query, without any aggregation effect. Is this an error or is there a specific principle behind it?

shvdiwnkozbw / Self-supervised-Video-Concept

About Cross Attention #3