I am reading both the paper and code base concurrently and realize that encoder and decoder layer specifications are changed. In the paper [3.3 network training] it is mentioned to use both bath normalization and Tanh but in the most recent update it was changed to a custom layer normalization and ELU? Is there any reason towards this change and what are the qualitative differences?
Hi! great work.
I am reading both the paper and code base concurrently and realize that encoder and decoder layer specifications are changed. In the paper [3.3 network training] it is mentioned to use both bath normalization and Tanh but in the most recent update it was changed to a custom layer normalization and ELU? Is there any reason towards this change and what are the qualitative differences?
Thank you very much!