prajdabre / yanmtt

Yet Another Neural Machine Translation Toolkit
MIT License
174 stars 32 forks source link

Add post-norm to the model #51

Open prajdabre opened 2 years ago

prajdabre commented 2 years ago

Currently the mbart backbone code I use has pre-norm which is layer(norm(input))+input whereas some people seem to say that postnorm which is norm(layer(input)+input) might be better for zeor shot. Lord alone knows whats going to be useful when.

Having a flag to control pre- and post-norm in the encoder and decoder would be perfect.