I saw you fused the add and norm operations in the Block class. I'm unsure of the difference between fused_add_norm=True and fused_addn_norm=False. More specifically, can I simply treat fused_add_norm_fn as a integration of following codes?
By the way, have you tried to use LN -> Mixer -> Add like a standard block does? Will it be different compared with Add -> LN -> Mixer in accuracy or speed?
For the first question, you can regard fused_add_norm as a fast version of your displayed code.
We also use the LN -> Mixer -> Add for the second question for each block.
Hi @Unrealluver,
I saw you fused the
add
andnorm
operations in the Block class. I'm unsure of the difference betweenfused_add_norm=True
andfused_addn_norm=False
. More specifically, can I simply treatfused_add_norm_fn
as a integration of following codes?https://github.com/hustvl/Vim/blob/06c50090534d7a9e18142af4d57bb635a3085edf/vim/models_mamba.py#L75-L83
By the way, have you tried to use
LN -> Mixer -> Add
like a standard block does? Will it be different compared withAdd -> LN -> Mixer
in accuracy or speed?