Closed pengzhangzhi closed 9 months ago
I am not the author of the paper. But as far as I understood.
They seem to have fixed the mamba_ssm/mamba_simple
code for bidirectional mamba.
# bidirectional
assert bimamba_type == "v2"
if self.use_fast_path and inference_params is None: # Doesn't support outputting the states
if self.bimamba_type == "v2":
A_b = -torch.exp(self.A_b_log.float())
https://github.com/hustvl/Vim/blob/main/mamba/mamba_ssm/modules/mamba_simple.py
The original Mamba implementation does not have the variable and associated code above. https://github.com/state-spaces/mamba/blob/main/mamba_ssm/modules/mamba_simple.py
oh yeah! thanks!
The code indeed contains the implementation of the bidirectional Mamba module. However, why have I never executed these lines of code during debugging?
The link https://github.com/hustvl/Vim/blob/main/mamba/mamba_ssm/modules/mamba_simple.py is also broken.
Hi!
Do you know what's the meaning of self.if_bidirectional in models_mamba.py? @eclipse0922 @87831743Sakura @arelkeselbri @pengzhangzhi @xinggangw
Thanks!
The meaning of self.if_bidirectional in models_mamba.py relates to a key innovation in the Vim (Vision Mamba) model, as described in the paper "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model". The original Mamba model was designed for 1D sequence data like text or audio, typically processed in a forward direction. However, images are inherently 2D data, which presents unique challenges when adapting sequence models to vision tasks. Vim addresses this by using a bidirectional approach:
Images are first flattened into a 1D sequence of patches. Instead of processing this sequence only in one direction, Vim uses bidirectional State Space Models (SSMs) in each block. This means each block processes the sequence both forward and backward, allowing it to capture spatial relationships and long-range dependencies from multiple perspectives.
The self.if_bidirectional flag likely controls whether this bidirectional processing is enabled. When true, it would activate both the forward and backward SSMs, as well as the bidirectional convolutions described in the Vim architecture. This bidirectional approach is crucial for tasks like object detection and segmentation, where understanding the global context and spatial structure of an image is important. It allows Vim to more effectively capture the 2D nature of image data within the framework of a sequence model. It's worth noting that there are various approaches to adapting Mamba-like models for vision tasks, with different methods for processing 2D data. The bidirectional approach in Vim is one specific strategy to address this challenge.
Please read the VIM and recent survey papers for better understanding.
The meaning of self.if_bidirectional in models_mamba.py relates to a key innovation in the Vim (Vision Mamba) model, as described in the paper "Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model". The original Mamba model was designed for 1D sequence data like text or audio, typically processed in a forward direction. However, images are inherently 2D data, which presents unique challenges when adapting sequence models to vision tasks. Vim addresses this by using a bidirectional approach:
Images are first flattened into a 1D sequence of patches. Instead of processing this sequence only in one direction, Vim uses bidirectional State Space Models (SSMs) in each block. This means each block processes the sequence both forward and backward, allowing it to capture spatial relationships and long-range dependencies from multiple perspectives.
The self.if_bidirectional flag likely controls whether this bidirectional processing is enabled. When true, it would activate both the forward and backward SSMs, as well as the bidirectional convolutions described in the Vim architecture. This bidirectional approach is crucial for tasks like object detection and segmentation, where understanding the global context and spatial structure of an image is important. It allows Vim to more effectively capture the 2D nature of image data within the framework of a sequence model. It's worth noting that there are various approaches to adapting Mamba-like models for vision tasks, with different methods for processing 2D data. The bidirectional approach in Vim is one specific strategy to address this challenge.
Please read the VIM and recent survey papers for better understanding.
Thanks for your quick response.
So, should the if_bidirectional argument be set to True rather than False?
Thanks! @eclipse0922
I think so, yes If you want to take advantage from vim' key idea.
I think so, yes If you want to take advantage from vim' key idea.
Okay! I got it, thank u!
Hi there, your vim is impressive! I had a great read of the paper and the code! I am trying to follow the bi-directional mamba but can't find where the code is. I looked https://github.com/hustvl/Vim/blob/main/vim/models_mamba.py