mikegashler / waffles

A toolkit of machine learning algorithms.
http://gashler.com/mike/waffles/
86 stars 33 forks source link

Separate activation functions #31

Closed thelukester92 closed 8 years ago

thelukester92 commented 8 years ago

Several ML libraries, such as Torch, separate activation functions as their own layer (in Torch, they call it a "transfer layer" or "transfer function"). If we were to separate activations in this way, it would be even easier to implement parametrized activation functions, and new layers would have no need to worry about passing activation function parameters along to the optimizer.

Here's the idea. Originally, we would have done something like this:

GNeuralNet nn;
nn.addLayer(new GLayerClassic(FLEXIBLE_SIZE, 300, new GActivationTanh()));

The proposed method is this:

GNeuralNet nn;
nn.addLayer(new GLayerClassic(FLEXIBLE_SIZE, 300));
nn.addLayer(new GTanhLayer());

Or, using the convenience method GNeuralNet::addLayers:

GNeuralNet nn;
nn.addLayers(300, new GTanhLayer());

What are your thoughts on this? This is another potentially breaking change, since the default "activation" would be identity and there would be no need to "deactivate" a layer (since backprop through the activation layer would be "deactivating" the blame).

thelukester92 commented 8 years ago

This could be explored after #26 is merged, if there is any interest in this.

mikegashler commented 8 years ago

Yep, that seems like a better design to me. I guess it's best to make breaking changes in close succession, rather than spreading them out.

StephenAshmore commented 8 years ago

I do agree as well. It might be a bit annoying with some layers where the layer is different from what is currently a GLayerClassic but can have differing activation functions. Something like: GSpecialLayer, we would need to create GSpecialLayerTanH, GSpecialLayerHinge, GSpecialLayerBentIdentity, etc. While not difficult, might be slightly annoying. Luckily, I don't see any cases of that in our current code where it will be an issue.

thelukester92 commented 8 years ago

In addition to specific activation layers (GActivationLayerTanh, etc) we could have a general purpose activation layer that takes a GActivation as input:

nn.addLayers(
    500, new GActivationLayerTanh(),
    300, new GActivationLayer(new GActivationSoftPlus()),
    FLEXIBLE_SIZE
);
thelukester92 commented 8 years ago

Here's a problem we'll need to solve. Right now, we can do something like this:

nn.addLayers(
    new GLayerClassic(FLEXIBLE_SIZE, 500),
    new GLayerClassic(FLEXIBLE_SIZE, 500),
    new GLayerClassic(FLEXIBLE_SIZE, FLEXIBLE_SIZE),
);

How can we accomplish the same thing with the new approach? Right now, beginIncrementalLearning resizes the input and output layers to fit if they are flexible size, but the second-to-last layer will also need to be resized. Consider these layers, for example:

nn.addLayers(
    new GLayerLinear(500),
    new GLayerTanH(),
    new GLayerLinear(500),
    new GLayerTanH(),
    new GLayerLinear(FLEXIBLE_SIZE),
    new GLayerTanH(),
);
mikegashler commented 8 years ago

I'm not sure whether "FLEXIBLE_SIZE" is more helpful or confusing. For hidden layers, we always seem to use it on the input end, and never on the output end. Also, the sizes of the input and output ends of the whole network are currently automatically adjusted when you call beginIncrementalLearning, even if they are not specified as having a flexible size. So, I'm open to dumping "FLEXIBLE_SIZE" as a concept.

Since issue #32 recommends that we support multiple layers feeding into one layer, some careful designing will be necessary. I see two overarching approaches: (1) Each layer knows about all the layers that feed into it (think some kind of double-linked list). The layers automatically adjust themselves to fit with each other. (2) Users are required to explicitly specify the total number of inputs feeding into each layer as well as the number of outputs. We throw an exception if the layers don't fit together nicely. If users try to retrain a neural network using data with the wrong size, we throw an exception. I currently lean toward option #2. It's simpler, and we can always figure out how to add convenience methods later.

On 10/28/2016 10:19 PM, Luke Godfrey wrote:

Here's a problem we'll need to solve. Right now, we can do something like this:

|nn.addLayers( new GLayerClassic(FLEXIBLE_SIZE, 500), new GLayerClassic(FLEXIBLE_SIZE, 500), new GLayerClassic(FLEXIBLE_SIZE, FLEXIBLE_SIZE), ); |

How can we accomplish the same thing with the new approach? Right now, beginIncrementalLearning resizes the input and output layers to fit if they are flexible size, but the second-to-last layer will also need to be resized. Consider these layers, for example:

|nn.addLayers( new GLayerLinear(500), new GLayerTanH(), new GLayerLinear(500), new GLayerTanH(), new GLayerLinear(FLEXIBLE_SIZE), new GLayerTanH(), ); |

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mikegashler/waffles/issues/31#issuecomment-257067796, or mute the thread https://github.com/notifications/unsubscribe-auth/AB2c6z9rOL18aY0ZXbca0MSvejqWE_-nks5q4rtCgaJpZM4Kjcxv.

thelukester92 commented 8 years ago

I think when we construct a layer we should only have to specify the number of neurons in that layer (the outputs). That part is simple enough to accomplish. Perhaps we should drop FLEXIBLE_SIZE for outputs but allow them for inputs?

In my example above I specified just the output size, implicitly using flexible sized inputs. Maybe we should move that way and drop explicit FLEXIBLE_SIZE, requiring a number of neurons (output size) for all layers.

thelukester92 commented 8 years ago

I have pushed changes that use this syntax (verbose):

size_t outputs = 10;
nn.addLayer(new GLayerLinear(500));
nn.addLayer(new GLayerActivation(new GActivationTanH()));
nn.addLayer(new GLayerLinear(500));
nn.addLayer(new GLayerActivation(new GActivationTanH()));
nn.addLayer(new GLayerLinear(outputs));
nn.addLayer(new GLayerActivation(new GActivationTanH()));

But, with three convenience methods added, this becomes very readable. The first convenience method is addLayer(size_t outputs), which builds a GLayerLinear with flexible inputs and set outputs (this method previously created a GLayerClassic with tanh activation). The second is addLayer(GActivationFunction *), which builds a GLayerActivation from the given activation function (this is a new method). The third method (which was already there) allows us to add multiple layers at once. Here's the pretty version of the snippet above:

size_t outputs = 10;
nn.addLayers(
    500, new GActivationTanH(),
    500, new GActivationTanH(),
    outputs, new GActivationTanH()
);

I am quite content with this interface. Any objections?

All tests are passing, there is minimal impact on existing code that uses GLayerClassic (I left that as a "legacy" option for a linear layer with a built-in activation function), and the wall time of the code is comparable (I threw together a quick test and got 58 seconds for 38% error on MNIST before the change, and 57 seconds for 38% error on MNIST after the change).

thelukester92 commented 8 years ago

Also, FYI, I call it GLayerLinear because it produces a linear combination of the inputs for each output neuron. I think that is sufficiently descriptive, but let me know if it is not.

thelukester92 commented 8 years ago

Done and ready to merge. Any objections?

thelukester92 commented 8 years ago

Merged into branch for #26

mikegashler commented 8 years ago

Nope. Make it happen.

On 10/29/2016 09:07 AM, Luke Godfrey wrote:

Done and ready to merge. Any objections?

thelukester92 commented 8 years ago

Merged.