Separates activation functions from fully connected layers and Introduces activation layers

beomyeol commented 8 years ago

This pull request removes activation functions from fully connected layers and introduces activation layers. The activation layer applies the activation function specified by the configuration file to its input value.

This pull request includes the following changes

Changes API of LayerBase's 'backPropagate() method.
Introduces DefaultLayerParameterInitializer and an empty layer parameter EMPTY for layers which are not learnable.
Changes the type of exceptions related to invalid indices from RuntimeException to IllegalArgumentException.
Fixes bugs related to indices in backPropagateFromTo().
Fixes typo isForward.
Adds test for activation layers.

The change of LayerBase's backPropagate() method API includes the removal of nextParameter and the introduction of input. The introduction of activation layers makes nextParameter unnecessary in backPropagate(). Also, although the input value is not needed now, it will be needed for computing gradients of some functions such as absolute value and power functions.

This closes #144.

jsjason commented 8 years ago

@beomyeol The tests succeed but the example finished with an error (although, the training did progress according to the logs).

2015-11-19 15:36:56,171 INFO edu.snu.dolphin.dnn.NeuralNetworkREEF.main main | REEF job completed: FAILED(java.lang.NullPointerException: Null ids parameter.)

Am I the only one that met this error?

beomyeol commented 8 years ago

@jsjason I addressed your comments and left some comments about your questions. Also, I fixed a bug that the backpropagation does not work well in a neural network with 2 layers and introduced ActivationWithLossLayer.

ActivationWithLossLayer is a wrapper of ActivationLayer. Its forward pass is same to that of ActivationLayer, but a backward pass is calculating a derivative of the configured loss function with respect to the output value and the expected output value of the layer. The cross-entropy loss function is included in this pull request. We can add other loss functions later.

The ActivationWithLossLayer is needed for supporting other loss functions, and for the consistency of which layer (itself or the next layer) calculates errors to generate gradients for training the layer.

beomyeol commented 8 years ago

@jsjason I did not experience the error that you said. Please send me the logs when the error occurred.

jsjason commented 8 years ago

The new ActivationWithLoss layer seems good. @beomyeol Please check my comments. I'll show you the logs later, offline.

beomyeol commented 8 years ago

@jsjason Thanks for your review. I've addressed your comments. Please take another look.

jsjason commented 8 years ago

@beomyeol The changes look great, but I have an extra question. If I train a network with two FullyConnectedLayers and no ActivationWithLoss layers, I get an exception. But what I expected was that the application finishes correctly, although without any meaningful results. Is throwing an exception intended?

jsjason commented 8 years ago

Here is the code I ran:

final Configuration nnConf = NeuralNetworkConfigurationBuilder.newConfigurationBuilder()
        .setBatchSize(1)
        .setStepsize(1e-2f)
        .setParameterProviderClass(LocalNeuralNetParameterProvider.class)
        .addLayerConfiguration(
            FullyConnectedLayerConfigurationBuilder.newConfigurationBuilder()
                .setNumInput(3)
                .setNumOutput(4)
                .setInitWeight(0.0001f)
                .setInitBias(0.0002f)
                .setRandomSeed(10)
                .build())
        .addLayerConfiguration(
            FullyConnectedLayerConfigurationBuilder.newConfigurationBuilder()
                .setNumInput(4)
                .setNumOutput(3)
                .setInitWeight(0.0001f)
                .setInitBias(0.0002f)
                .setRandomSeed(10)
                .build())
        .build();

    final Injector injector = Tang.Factory.getTang().newInjector(nnConf);
    neuralNetwork = injector.getInstance(NeuralNetwork.class);
    neuralNetwork.train(Nd4j.create(new float[]{1, 2, 3}), 1);

beomyeol commented 8 years ago

@jsjason If the last layer is a fully connected layer, the backpropagation cannot work well since the fully connected layer needs the next layer's error. Because the error value that the last layer gets as the argument of the backpropagation cannot be defined. the neural network model uses the empty array as the argument of the last layer's backpropagation, which causes the exception. Thus, the last layer's backpropagation should generate an error without the next layer's error (e.g. ActivationWithLossLayer's one).

jsjason commented 8 years ago

@beomyeol Thanks. Another question: I know I'm asking you the same thing over and over again, but I'm confused yet again about excluding the first layer from back propagation. If we had n layers (not counting the input layer), shouldn't we have exactly n error vectors (matrices)? This code seems to output n-1 error vectors.

beomyeol commented 8 years ago

@jsjason Yes, this code generate n-1 error vectors. The point that you may miss is that the last layer cannot be learnable. Let me suppose all layers except the last one are learnable. The network model can generate the parameter updates for these n-1 layers with n-1 error vectors. To be specific, the parameter updates for (n-1)-th layer is calculated with the error generated by n-th layer's backpropagation, the parameter update (n-2)-th layer is calculated with the error generated by (n-1)-th layer's one. That's why we can work well with n-1 error vectors. Please let me know if you are still confused.

jsjason commented 8 years ago

@beomyeol Thanks, I understand what's going on in this PR now. I'll merge this after testing.

snuspl / dolphin

Separates activation functions from fully connected layers and Introduces activation layers #145