minghangz / cnm

Weakly Supervised Video Moment Localisation with Contrastive Negative Sample Mining
22 stars 4 forks source link

Formal representation of conditional attention #6

Open EricPaul03 opened 6 months ago

EricPaul03 commented 6 months ago

Hello, can you tell me a detailed representation of introducing a mask into the attention mechanism, which can be understood as (QK ^ T mask) V? Can mask be directly multiplied with K or V first?