Open spribitzer opened 5 years ago
Using ReLu as activation for any but the last layer introduces spikey features into the distance distribution (see figure, blue is the true distribution, red the fit).
This is most pronounced for the third layer being ReLU, and barely noticeable for the first or second layers.
apparently sigmoid activation functions are not how most modern neural networks are done anymore and ReLUs are easier to train