knkski / atai

Analyze This! AI competition
1 stars 1 forks source link

Investigate activation functions #5

Closed knkski closed 6 years ago

knkski commented 6 years ago

Right now we're using ReLU for our activation layers because it works pretty well. However, we should test the alternatives and make a nice pretty graph that we could present:

https://keras.io/activations/

ykq1004 commented 6 years ago

based on this paper, https://arxiv.org/pdf/1511.07289v1.pdf, RELU seems to work better than relu or leakyRelu, tested out on the GPU train?

knkski commented 6 years ago

@ziziyue: That seems accurate. This graph shows the effects of different activation functions on a relatively simple network:

image

From left to right, the activation functions used were ELU, Leaky ReLU, PReLU, and ReLU. They all converged to very close to the same value, but ELU converged much faster than the others.

Closing this issue, as it looks like ELU is a clear winner.