XORHelloWorld fails to train sporadically

MarkRebuck commented 11 years ago

I am testing the feasibility of replacing a home-grown Neural Network library with encog. As a first step of my feasibility study, I replaced my existing XOR unit test with an encog-based version. My unit test is almost a direct cut/paste from the XORHelloWorld sample.

However, every so often the unit test fails. Included below is a slightly modified version the XORHelloWorld which tries to train XOR 1000 times, stopping if it can not train after 1000000 iteration()s. On every JVM I tried, I get a repeatable 4-5 failures per 1000. While I understand that not every network will train every problem every time... We're talking about XOR here, the worlds most simple Neural Network problem. Clearly something is amiss :-).

When the network fails to train, it fails quite badly. In other words, it doesn't "get close but not all the way to tolerance". It "goes off the deep end and gives wonky results".

On inspection, I saw that the sample tries to train all the way to 0/1, which is a difficult thing to do when using Sigmoid activation/output functions. Changing the sample targets from the defaults to: public static double XOR_IDEAL[][] = { { 0.01 }, { 0.99 }, { 0.01 }, { 0.99 } }; ...trained quickly 10 million out of 10 million times. Problem solved, right? Well...

While this could be considered an issue with the sample training to unreasonable values, I believe it goes deeper than that. Looking at ActivationSigmoid, BoundMath, and BoundNumbers, it appears that encog handles sigmoid and its derivative quite poorly at the extremes. The home-grown network I'm replacing had a similar issue many years ago, which was fixed by doing the following (feel free to implement this as a patch if you wish):

    public static double sigmoid(double val) {
        if (val < -18.4206807389) {
            val = 0.00000001;
        } else if (val > 18.4206807389) {
            val = 0.99999999;
        } else {
            val = 1.0 / (1.0 + Math.exp(-val));
        }
        return val;
    }

Without the proper bounds checks, I believe encog is merrily wandering off towards a divide-by-zero error while caculating the derivative of sigmoid() near its limits.

Here is the modified version of XORHelloWorld which shows the sporadic failure:

/*
 * Encog(tm) Examples v3.1 - Java Version
 * http://www.heatonresearch.com/encog/
 * http://code.google.com/p/encog-java/

 * Copyright 2008-2012 Heaton Research, Inc.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 *   
 * For more information on Heaton Research copyrights, licenses 
 * and trademarks visit:
 * http://www.heatonresearch.com/copyright
 */
package org.encog.examples.neural.xor;

import org.encog.Encog;
import org.encog.engine.network.activation.ActivationSigmoid;
import org.encog.ml.data.MLData;
import org.encog.ml.data.MLDataPair;
import org.encog.ml.data.MLDataSet;
import org.encog.ml.data.basic.BasicMLDataSet;
import org.encog.neural.networks.BasicNetwork;
import org.encog.neural.networks.layers.BasicLayer;
import org.encog.neural.networks.training.propagation.resilient.ResilientPropagation;

/**
 * XOR: This example is essentially the "Hello World" of neural network
 * programming.  This example shows how to construct an Encog neural
 * network to predict the output from the XOR operator.  This example
 * uses backpropagation to train the neural network.
 * 
 * This example attempts to use a minimum of Encog features to create and
 * train the neural network.  This allows you to see exactly what is going
 * on.  For a more advanced example, that uses Encog factories, refer to
 * the XORFactory example.
 * 
 */
public class XORHelloWorld {

    /**
     * The input necessary for XOR.
     */
    public static double XOR_INPUT[][] = { { 0.0, 0.0 }, { 1.0, 0.0 },
            { 0.0, 1.0 }, { 1.0, 1.0 } };

    /**
     * The ideal data necessary for XOR.
     */
    public static double XOR_IDEAL[][] = { { 0.0 }, { 1.0 }, { 1.0 }, { 0.0 } };

    /**
     * The main method.
     * @param args No arguments are used.
     */
    public static void main(final String args[]) {
        for (int attempts = 0; attempts < 1000; attempts++) {
            // create a neural network, without using a factory
            BasicNetwork network = new BasicNetwork();
            network.addLayer(new BasicLayer(null,true,2));
            network.addLayer(new BasicLayer(new ActivationSigmoid(),true,3));
            network.addLayer(new BasicLayer(new ActivationSigmoid(),false,1));
            network.getStructure().finalizeStructure();
            network.reset();

            // create training data
            MLDataSet trainingSet = new BasicMLDataSet(XOR_INPUT, XOR_IDEAL);

            // train the neural network
            final ResilientPropagation train = new ResilientPropagation(network, trainingSet);

            int epoch = 1;

            boolean failed = false;
            do {
                train.iteration();
                //System.out.println("Epoch #" + epoch + " Error:" + train.getError());
                epoch++;
                if (epoch > 1000000) {
                    failed = true;
                    break;
                }
            } while(train.getError() > 0.01);

            // test the neural network
            if (failed) {
                System.out.println("Failed to train netword after attempt #" + attempts);
                for(MLDataPair pair: trainingSet ) {
                    final MLData output = network.compute(pair.getInput());
                    System.out.println(pair.getInput().getData(0) + "," + pair.getInput().getData(1)
                            + ", actual=" + output.getData(0) + ",ideal=" + pair.getIdeal().getData(0));
                }
            }

            Encog.getInstance().shutdown();
        }
    }
}

MarkRebuck commented 11 years ago

If anyone is looking at this...

Adding (new ConsistentRandomizer(-1,1,944267208)).randomize(network); after network.reset(); in the stock XORHelloWorld.java provides a consistent failure-to-train for me. Adding a network.dumpWeights() shows the hidden layer weights going off the reservation during training.

jeffheaton commented 10 years ago

Thank you for the information. I was able to reproduce with your data. I also tried your version of the sigmoid. But it still fails to converge with that starting point.

otearle commented 7 years ago

Hi Geoff, we are seeing this issue too. Any progress solving it? I'm more than happy to investigate, would you have any idea where a good starting point would be? Cheers

jeffheaton commented 7 years ago

This is really just an issue of a bad set of initial weights.

jeffheaton / encog-java-core

XORHelloWorld fails to train sporadically #137