Closed nitbix closed 7 years ago
That would be great. I am not.
I experimented with the ReL activation function in Encog, my only concern is that ReL is not differentiable at 0. I looked at other implementations and they usually just return 0 as derivative at 0, but I don't know if we should tweak something in Encog to properly manage this situation. @jeffheaton what do you think about this?
That should be okay, that is the purpose of the hasDerivative() method defined on the ActivationFunction interface.
The point is that hasDerivative() should be True so we can use it with gradient descent methods (actually, I'd say its main purpose is to have a fast activation function which is also more resistant to vanishing gradient problems in deep networks), but the Rectified Linear function is not differentiable at 0 ( http://www.wolframalpha.com/input/?i=f%28x%29+%3D+max%280%2C+x%29 ), while it is differentiable above and below 0. I looked at other implementation and they usually "cheat" about the derivative at 0 and they just return 0 as derivative at 0. Do you think we can just ignore this detail or should we tweak something in Encog to avoid the situation of having to request a derivative at exactly 0?
Mocha ( https://github.com/pluskid/Mocha.jl) docs say:
ReLU is actually not differentialble at 0. But it has subdifferential [0,1]. Any value in that interval could be taken as a subderivative, and could be used in SGD if we generalize from gradient descent to subgradient descent. In the implementation, we choose 0.
Source: http://mochajl.readthedocs.org/en/latest/user-guide/neuron.html
Hi guys,
sorry for resurrecting an old conversation, but did we reach an agreement here in the end or should I just go ahead and do my own thing separately for the time being?
Thanks,
Alan
On 21 November 2014 at 15:18, ekerazha notifications@github.com wrote:
Mocha ( https://github.com/pluskid/Mocha.jl) docs say:
ReLU is actually not differentialble at 0. But it has subdifferential [0,1]. Any value in that interval could be taken as a subderivative, and could be used in SGD if we generalize from gradient descent to subgradient descent. In the implementation, we choose 0.
Source: http://mochajl.readthedocs.org/en/latest/user-guide/neuron.html
— Reply to this email directly or view it on GitHub https://github.com/encog/encog-java-core/issues/185#issuecomment-63983889 .
Alan
Given the lack of response, I'm just going to go ahead and write my own. I can contribute it back if there is interest.
On 1 March 2015 at 19:36, nitbix nitbix@nitbix.com wrote:
Hi guys,
sorry for resurrecting an old conversation, but did we reach an agreement here in the end or should I just go ahead and do my own thing separately for the time being?
Thanks,
Alan
On 21 November 2014 at 15:18, ekerazha notifications@github.com wrote:
Mocha ( https://github.com/pluskid/Mocha.jl) docs say:
ReLU is actually not differentialble at 0. But it has subdifferential [0,1]. Any value in that interval could be taken as a subderivative, and could be used in SGD if we generalize from gradient descent to subgradient descent. In the implementation, we choose 0.
Source: http://mochajl.readthedocs.org/en/latest/user-guide/neuron.html
— Reply to this email directly or view it on GitHub https://github.com/encog/encog-java-core/issues/185#issuecomment-63983889 .
Alan
Alan
Adding the ReL activation function is easy, but there's the question about differentiability at 0.
This was my implementation:
public final void activationFunction(final double[] x, final int start,
final int size) {
for (int i = start; i < start + size; i++) {
if (x[i] < 0) {
x[i] = 0;
}
}
}
and
public final double derivativeFunction(double b, double a) {
if (b > 0) {
return 1;
}
return 0;
}
Yes, hence the question on whether there was a consensus on what to do. I'm going to use 0 in my own fork and not contribute it back - that's what I meant by "write my own".
On 5 March 2015 at 09:20, ekerazha notifications@github.com wrote:
Adding the ReL activation function is trivial, but there's the question about differentiability at 0.
— Reply to this email directly or view it on GitHub https://github.com/encog/encog-java-core/issues/185#issuecomment-77331150 .
Alan
Thanks, that's almost exactly what I ended up doing as well.
On 5 March 2015 at 09:41, ekerazha notifications@github.com wrote:
This was my implementation:
public final void activationFunction(final double[] x, final int start, final int size) { for (int i = start; i < start + size; i++) { if (x[i] < 0) { x[i] = 0; } } }
and
public final double derivativeFunction(double b, double a) { if (b > 0) { return 1; } else { return 0; } }
— Reply to this email directly or view it on GitHub https://github.com/encog/encog-java-core/issues/185#issuecomment-77334215 .
Alan
A smooth approximation to the reLU is softplus. By softplus: f(x) = ln(1 + e^x) f'(x) = 1/(1+e^(-x)) f'(0) = 0.5 It looks a reasonable choice for derivative of reLU at zero because it's the middle of left and right derivative at zero. Or just use softplus rather than reLU.
For Encog 3.4 I've been using this with:
public final double derivativeFunction(final double b, final double a) { if(b <= this.params[ActivationReLU.PARAM_RELU_LOW_THRESHOLD]) { return 0; } return 1.0; }
It seems to converge well. I believe it common convention for the derivative to be 0 at zero. Such as: http://kawahara.ca/what-is-the-derivative-of-relu/
Hi, is anybody already working on adding Rectified Linear Units (ReLU)? Otherwise I'll work on adding support for them - seems easy enough.