wangkuiyi / deeplearning

The Gritty Details of Deep Learning
21 stars 2 forks source link

Need a chapter comparing activations #14

Open wangkuiyi opened 6 years ago

wangkuiyi commented 6 years ago

Not just sigmoid and tanh -- both of them are not sufficiently good.

We should include ReLU its variants, like

A comparison is here https://datascience.stackexchange.com/questions/14349/difference-of-activation-functions-in-neural-networks-in-general

An additional notice is that the maximum slope of sigmoid is 1/4, but that of the tanh is 1, which is four times larger than that of the sigmoid. A larger gradient is preferred primarily because gradients are multiplied along the chain rule.

wangkuiyi commented 6 years ago

ReLU need to work with batch norm.