Currently the mbart backbone code I use has pre-norm which is layer(norm(input))+input whereas some people seem to say that postnorm which is norm(layer(input)+input) might be better for zeor shot. Lord alone knows whats going to be useful when.
Having a flag to control pre- and post-norm in the encoder and decoder would be perfect.
Currently the mbart backbone code I use has pre-norm which is layer(norm(input))+input whereas some people seem to say that postnorm which is norm(layer(input)+input) might be better for zeor shot. Lord alone knows whats going to be useful when.
Having a flag to control pre- and post-norm in the encoder and decoder would be perfect.