Regarding Attention Heatmap

rxtan2 / Koala-video-llm

BSD 3-Clause "New" or "Revised" License

25 stars 3 forks source link

Hi rbsohee, thank you very much for your interest in our work! I apologize for the delay due to some deadlines. We use a simplified method similar to attention rollout to extract the attention weights from the Video Q-former. The 32 visual queries are frozen. However, we append the learnable queries which interact with the visual queries through the self-attention layers. This causes the representations of the queries to change, which also affects the attention weights. Additionally, due to the complexity of the model, we used a simplified version before and are now evaluating new ways to extract such attention maps. We are working on cleaning up the script and code component to extract the attention maps for public use and will release it once it is cleaned and tested.

rxtan2 / Koala-video-llm

Regarding Attention Heatmap #5