hustvl / Vim

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Apache License 2.0
2.55k stars 159 forks source link

Mamba Structure #57

Open yunzqq opened 3 months ago

yunzqq commented 3 months ago

It seems like that each block containes a bidirectional Mamba. For all the Blocks, 1-th block takes the original input and 2-rd block takes the flip as input. And repeat this process?

So why each block contains both forward and backward instead of one block for forward and one for backward, and repeat this process.

jsrdcht commented 1 month ago

Dude, you found the point. I'm confused too