Closed allanzelener closed 8 years ago
Changing p from global to local in all modules.
In layer normalization, moving addition of eps inside sqrt to prevent undefined gradient at sqrt(0).
LGTM
Changing p from global to local in all modules.
In layer normalization, moving addition of eps inside sqrt to prevent undefined gradient at sqrt(0).