hypnopump / E-swish

Code for reproducibility of the E-swish paper experiments
16 stars 7 forks source link

The second original Swish paper #1

Open EliasHasle opened 6 years ago

EliasHasle commented 6 years ago

Here: https://www.semanticscholar.org/paper/Searching-for-Activation-Functions-Ramachandran-Zoph/c8c4ab59ac29973a00df4e5c8df3773a3c59995a

It was published in Arxiv before your paper, so it should be cited and commented, in my opinion. They have found (or "found") through a search, the swish function with a beta factor inside the sigmoid, whereas you add one outside.

As far as I can see, for unconstrained weights a beta outside the sigmoid does exactly the same as increasing all the weights from the node, so the network will be able to represent exactly the same functions as pure swish (except the last layer may have no weights out). And a beta inside the sigmoid is equivalent to changing all the weights into the node (except the first layer may have no weights into it).

So basically, the beta parameters only affect the learning process, and will obviously interact with other learning parameters/choices and regularization. (Using SGD instead of Adam for a comparison based on another paper counts as such a choice.)

Please enlighten me if I am wrong.

MichaelFomenko commented 4 years ago

He clearly dont understand anything about Deep Learning, he only published this paper to have a published paper for his career.

hypnopump commented 4 years ago

Hi, @EliasHasle I do cite the paper by Ramachandran et al. in my paper already. wrt to the concern about the beta parameter, it's not the same:

I hope the image is clarifying!

dafuq

hypnopump commented 4 years ago

@MichaelFomenko always glad to recieve constructive criticism

MichaelFomenko commented 4 years ago

Sorry EricAlcaide to tell you the truth, but you clearly don't understand Deep Learning, if you would understand Deep Learning you would know that the Beta in your E-Swish Function is just the Weight of the next Layer. This means that Mathematicaly there is no diverence between your E-Swish and the Swish Function.