Open themingcha opened 1 year ago
I would like to see where the model focuses attention on video features for text queries. How can I visualize the cross-attention heatmap?
I would like to see where the model focuses attention on video features for text queries. How can I visualize the cross-attention heatmap?