Closed knkski closed 6 years ago
based on this paper, https://arxiv.org/pdf/1511.07289v1.pdf, RELU seems to work better than relu or leakyRelu, tested out on the GPU train?
@ziziyue: That seems accurate. This graph shows the effects of different activation functions on a relatively simple network:
From left to right, the activation functions used were ELU, Leaky ReLU, PReLU, and ReLU. They all converged to very close to the same value, but ELU converged much faster than the others.
Closing this issue, as it looks like ELU is a clear winner.
Right now we're using ReLU for our activation layers because it works pretty well. However, we should test the alternatives and make a nice pretty graph that we could present:
https://keras.io/activations/