nishnik / Paper-Leaf

Contains the description of various papers I have read or reading
12 stars 0 forks source link

Swish: a Self-Gated Activation Function #16

Open nishnik opened 5 years ago

nishnik commented 5 years ago

Swish: a Self-Gated Activation Function
Prajit Ramachandran, Barret Zoph, Quoc V. Le

New activation function x.sigmoid(x)
Key points from paper:
1.) The success of Swish implies that the gradient preserving property of ReLU (i.e., having a derivative of 1 when x > 0) may no longer be a distinct advantage in modern architectures.
2.) Activation functions should be unbounded
3.) But being bounded below is desirable as it works as a regularizer

Discussion on reddit:
1.) Combining with selu