New activation function x.sigmoid(x) Key points from paper:
1.) The success of Swish implies that the gradient preserving property of ReLU (i.e., having a derivative of 1 when x > 0) may no longer be a distinct advantage in modern architectures.
2.) Activation functions should be unbounded
3.) But being bounded below is desirable as it works as a regularizer
Swish: a Self-Gated Activation Function
Prajit Ramachandran, Barret Zoph, Quoc V. Le
New activation function
x.sigmoid(x)
Key points from paper:
1.) The success of Swish implies that the gradient preserving property of ReLU (i.e., having a derivative of 1 when x > 0) may no longer be a distinct advantage in modern architectures.
2.) Activation functions should be unbounded
3.) But being bounded below is desirable as it works as a regularizer
Discussion on reddit:
1.) Combining with selu