Open huu4ontocord opened 3 years ago
Do you need to pass in the eps to TransformerEncoderLayer and TransformerDecoderLayer, and correspondingly pass in the eps from the constructor, and pass in via the builder and mapper as well?
On a related note, it would be cool to specify other norms, like scalenorm and to be able to configure the aciviation function to something beside's pytorch (like relu^2)
From the code (adapted from test_weight_mapper.py)
encoder looks like this:
bert looks like this: