hustvl / Vim

Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Apache License 2.0
2.55k stars 159 forks source link

Why repeat the backward block? #87

Open jsrdcht opened 1 month ago

jsrdcht commented 1 month ago

Each "v2" Mamba block contains out_a and out_b, which is both forward and backward, but in the for loop here, we process two Mamba blocks at the same time, each has its out out_a and out_b, but the input for the second Mamba Block is flipped, which is qutie confusing, does that mean the flipped input for the second Mamba Block is not related to Mamba Block itself and mroe of a training mechanisim? Meaning, if the for loop processes one layer at a time, wouldn't a Mamba Block do a forward and backward SSM pass?

The default depth for small model is 24 caused by above,whcih is very heavy for computation resource! So why?

Same question can be found at #71 and #57