hustvl / Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Apache License 2.0
3.04k stars 205 forks source link

Hidden State understanding #32

Open sivaji123256 opened 9 months ago

sivaji123256 commented 9 months ago

Hi @xinggangw @Unrealluver @ifzhang @LegendBC , Thanks for the release of the great work. But, I was trying to understand how I can actually interpret the model such that the results will be more explainable. Using the pretrained image classification weights, I just want to understand whether during training or inference, how can I record the attention weights generated within the BSSM at each time step. I have gone through the mamba_model.py where we have hidden states and residuals. Can I directly use them for understanding of the model predictions or do I need to extract other details using main.py or Mamba_model.py?If yes, what are the different directions that we can look into like saliency maps etc ,. Any suggestions would be highly useful. Thanks in advance.

AmeenAli commented 9 months ago

Hi Sivaji! @sivaji123256 You can check our very recent work on Interpreting Mamba and how to extract the implicit attention :) https://github.com/AmeenAli/HiddenMambaAttn