ReLU activation - Githubissues

nitbix commented 9 years ago

Hi, is anybody already working on adding Rectified Linear Units (ReLU)? Otherwise I'll work on adding support for them - seems easy enough.

jeffheaton commented 9 years ago

That would be great. I am not.

ekerazha commented 9 years ago

I experimented with the ReL activation function in Encog, my only concern is that ReL is not differentiable at 0. I looked at other implementations and they usually just return 0 as derivative at 0, but I don't know if we should tweak something in Encog to properly manage this situation. @jeffheaton what do you think about this?

jeffheaton commented 9 years ago

That should be okay, that is the purpose of the hasDerivative() method defined on the ActivationFunction interface.

ekerazha commented 9 years ago

The point is that hasDerivative() should be True so we can use it with gradient descent methods (actually, I'd say its main purpose is to have a fast activation function which is also more resistant to vanishing gradient problems in deep networks), but the Rectified Linear function is not differentiable at 0 ( http://www.wolframalpha.com/input/?i=f%28x%29+%3D+max%280%2C+x%29 ), while it is differentiable above and below 0. I looked at other implementation and they usually "cheat" about the derivative at 0 and they just return 0 as derivative at 0. Do you think we can just ignore this detail or should we tweak something in Encog to avoid the situation of having to request a derivative at exactly 0?

ekerazha commented 9 years ago

Mocha ( https://github.com/pluskid/Mocha.jl) docs say:

ReLU is actually not differentialble at 0. But it has subdifferential [0,1]. Any value in that interval could be taken as a subderivative, and could be used in SGD if we generalize from gradient descent to subgradient descent. In the implementation, we choose 0.

Source: http://mochajl.readthedocs.org/en/latest/user-guide/neuron.html

nitbix commented 9 years ago

Hi guys,

sorry for resurrecting an old conversation, but did we reach an agreement here in the end or should I just go ahead and do my own thing separately for the time being?

Thanks,

Alan

On 21 November 2014 at 15:18, ekerazha notifications@github.com wrote:

Mocha ( https://github.com/pluskid/Mocha.jl) docs say:

ReLU is actually not differentialble at 0. But it has subdifferential [0,1]. Any value in that interval could be taken as a subderivative, and could be used in SGD if we generalize from gradient descent to subgradient descent. In the implementation, we choose 0.

Source: http://mochajl.readthedocs.org/en/latest/user-guide/neuron.html

— Reply to this email directly or view it on GitHub https://github.com/encog/encog-java-core/issues/185#issuecomment-63983889 .

Alan

nitbix commented 9 years ago

Given the lack of response, I'm just going to go ahead and write my own. I can contribute it back if there is interest.

On 1 March 2015 at 19:36, nitbix nitbix@nitbix.com wrote:

Hi guys,

sorry for resurrecting an old conversation, but did we reach an agreement here in the end or should I just go ahead and do my own thing separately for the time being?

Thanks,

Alan

On 21 November 2014 at 15:18, ekerazha notifications@github.com wrote:

Mocha ( https://github.com/pluskid/Mocha.jl) docs say:

ReLU is actually not differentialble at 0. But it has subdifferential [0,1]. Any value in that interval could be taken as a subderivative, and could be used in SGD if we generalize from gradient descent to subgradient descent. In the implementation, we choose 0.

Source: http://mochajl.readthedocs.org/en/latest/user-guide/neuron.html

— Reply to this email directly or view it on GitHub https://github.com/encog/encog-java-core/issues/185#issuecomment-63983889 .

Alan

Alan

ekerazha commented 9 years ago

Adding the ReL activation function is easy, but there's the question about differentiability at 0.

ekerazha commented 9 years ago

This was my implementation:

    public final void activationFunction(final double[] x, final int start,
            final int size) {

        for (int i = start; i < start + size; i++) {
            if (x[i] < 0) {
                x[i] = 0;
            }
        }
    }

and

    public final double derivativeFunction(double b, double a) {
        if (b > 0) {
            return 1;
        }
        return 0;
    }

nitbix commented 9 years ago

Yes, hence the question on whether there was a consensus on what to do. I'm going to use 0 in my own fork and not contribute it back - that's what I meant by "write my own".

On 5 March 2015 at 09:20, ekerazha notifications@github.com wrote:

Adding the ReL activation function is trivial, but there's the question about differentiability at 0.

— Reply to this email directly or view it on GitHub https://github.com/encog/encog-java-core/issues/185#issuecomment-77331150 .

Alan

nitbix commented 9 years ago

Thanks, that's almost exactly what I ended up doing as well.

On 5 March 2015 at 09:41, ekerazha notifications@github.com wrote:

This was my implementation:
public final void activationFunction(final double[] x, final int start,
        final int size) {

    for (int i = start; i < start + size; i++) {
        if (x[i] < 0) {
            x[i] = 0;
        }
    }
}
and
public final double derivativeFunction(double b, double a) {
    if (b > 0) {
        return 1;
    } else {
        return 0;
    }
}
— Reply to this email directly or view it on GitHub https://github.com/encog/encog-java-core/issues/185#issuecomment-77334215 .

Alan

ghost commented 9 years ago

A smooth approximation to the reLU is softplus. By softplus: f(x) = ln(1 + e^x) f'(x) = 1/(1+e^(-x)) f'(0) = 0.5 It looks a reasonable choice for derivative of reLU at zero because it's the middle of left and right derivative at zero. Or just use softplus rather than reLU.

jeffheaton commented 7 years ago

For Encog 3.4 I've been using this with:

public final double derivativeFunction(final double b, final double a) { if(b <= this.params[ActivationReLU.PARAM_RELU_LOW_THRESHOLD]) { return 0; } return 1.0; }

It seems to converge well. I believe it common convention for the derivative to be 0 at zero. Such as: http://kawahara.ca/what-is-the-derivative-of-relu/

jeffheaton / encog-java-core

ReLU activation #185