hustvl / Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Apache License 2.0
2.98k stars 201 forks source link

can't find the bi-directional mamba #11

Closed pengzhangzhi closed 9 months ago

pengzhangzhi commented 9 months ago

Hi there, your vim is impressive! I had a great read of the paper and the code! I am trying to follow the bi-directional mamba but can't find where the code is. I looked https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py

eclipse0922 commented 9 months ago

I am not the author of the paper. But as far as I understood. They seem to have fixed the mamba_ssm/mamba_simplecode for bidirectional mamba.

 # bidirectional
assert bimamba_type == "v2"
  if self.use_fast_path and inference_params is None:  # Doesn't support outputting the states
            if self.bimamba_type == "v2":
                A_b = -torch.exp(self.A_b_log.float())

https://github.com/hustvl/Vim/blob/main/mamba/mamba_ssm/modules/mamba_simple.py

The original Mamba implementation does not have the variable and associated code above. https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba_simple.py

pengzhangzhi commented 9 months ago

oh yeah! thanks!

87831743Sakura commented 8 months ago

The code indeed contains the implementation of the bidirectional Mamba module. However, why have I never executed these lines of code during debugging?

87831743Sakura commented 8 months ago

The link https://github.com/hustvl/Vim/blob/main/mamba/mamba_ssm/modules/mamba_simple.py is also broken.

arelkeselbri commented 6 months ago

Try https://github.com/hustvl/Vim/blob/main/mamba-1p1p1/mamba_ssm/modules/mamba_simple.py

MPCheng-ZW commented 2 months ago

Hi!

Do you know what's the meaning of self.if_bidirectional in models_mamba.py? @eclipse0922 @87831743Sakura @arelkeselbri @pengzhangzhi @xinggangw

Thanks!

eclipse0922 commented 2 months ago

The meaning of self.if_bidirectional in models_mamba.py relates to a key innovation in the Vim (Vision Mamba) model, as described in the paper "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model". The original Mamba model was designed for 1D sequence data like text or audio, typically processed in a forward direction. However, images are inherently 2D data, which presents unique challenges when adapting sequence models to vision tasks. Vim addresses this by using a bidirectional approach:

Images are first flattened into a 1D sequence of patches. Instead of processing this sequence only in one direction, Vim uses bidirectional State Space Models (SSMs) in each block. This means each block processes the sequence both forward and backward, allowing it to capture spatial relationships and long-range dependencies from multiple perspectives.

The self.if_bidirectional flag likely controls whether this bidirectional processing is enabled. When true, it would activate both the forward and backward SSMs, as well as the bidirectional convolutions described in the Vim architecture. This bidirectional approach is crucial for tasks like object detection and segmentation, where understanding the global context and spatial structure of an image is important. It allows Vim to more effectively capture the 2D nature of image data within the framework of a sequence model. It's worth noting that there are various approaches to adapting Mamba-like models for vision tasks, with different methods for processing 2D data. The bidirectional approach in Vim is one specific strategy to address this challenge.

Please read the VIM and recent survey papers for better understanding.

MPCheng-ZW commented 2 months ago

The meaning of self.if_bidirectional in models_mamba.py relates to a key innovation in the Vim (Vision Mamba) model, as described in the paper "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model". The original Mamba model was designed for 1D sequence data like text or audio, typically processed in a forward direction. However, images are inherently 2D data, which presents unique challenges when adapting sequence models to vision tasks. Vim addresses this by using a bidirectional approach:

Images are first flattened into a 1D sequence of patches. Instead of processing this sequence only in one direction, Vim uses bidirectional State Space Models (SSMs) in each block. This means each block processes the sequence both forward and backward, allowing it to capture spatial relationships and long-range dependencies from multiple perspectives.

The self.if_bidirectional flag likely controls whether this bidirectional processing is enabled. When true, it would activate both the forward and backward SSMs, as well as the bidirectional convolutions described in the Vim architecture. This bidirectional approach is crucial for tasks like object detection and segmentation, where understanding the global context and spatial structure of an image is important. It allows Vim to more effectively capture the 2D nature of image data within the framework of a sequence model. It's worth noting that there are various approaches to adapting Mamba-like models for vision tasks, with different methods for processing 2D data. The bidirectional approach in Vim is one specific strategy to address this challenge.

Please read the VIM and recent survey papers for better understanding.

Thanks for your quick response.

So, should the if_bidirectional argument be set to True rather than False?

1

2

Thanks! @eclipse0922

eclipse0922 commented 2 months ago

I think so, yes If you want to take advantage from vim' key idea.

MPCheng-ZW commented 2 months ago

I think so, yes If you want to take advantage from vim' key idea.

Okay! I got it, thank u!