hustvl / Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Apache License 2.0
2.56k stars 160 forks source link

Difference between with/without fused_add_norm #6

Open JiarunLiu opened 5 months ago

JiarunLiu commented 5 months ago

Hi @Unrealluver,

I saw you fused the add and norm operations in the Block class. I'm unsure of the difference between fused_add_norm=True and fused_addn_norm=False. More specifically, can I simply treat fused_add_norm_fn as a integration of following codes?

https://github.com/hustvl/Vim/blob/06c50090534d7a9e18142af4d57bb635a3085edf/vim/models_mamba.py#L75-L83

By the way, have you tried to use LN -> Mixer -> Add like a standard block does? Will it be different compared with Add -> LN -> Mixer in accuracy or speed?

Unrealluver commented 4 months ago

Hi Jiarun,

For the first question, you can regard fused_add_norm as a fast version of your displayed code. We also use the LN -> Mixer -> Add for the second question for each block.