Why repeat the backward block?

Each "v2" Mamba block contains out_a and out_b, which is both forward and backward, but in the for loop here, we process two Mamba blocks at the same time, each has its out out_a and out_b, but the input for the second Mamba Block is flipped, which is qutie confusing, does that mean the flipped input for the second Mamba Block is not related to Mamba Block itself and mroe of a training mechanisim? Meaning, if the for loop processes one layer at a time, wouldn't a Mamba Block do a forward and backward SSM pass?

The default depth for small model is 24 caused by above，whcih is very heavy for computation resource! So why?

Same question can be found at #71 and #57

hustvl / Vim

Why repeat the backward block? #87