keras-team / keras

Deep Learning for humans
http://keras.io/
Apache License 2.0
61.97k stars 19.46k forks source link

Norm constraints don't act properly on non 'Dense' layers #474

Closed the-moliver closed 8 years ago

the-moliver commented 9 years ago

The way the UnitNorm and MaxNorm constraints work is currently somewhat different, and problematic on certain layer types. For example if one wanted unit-norm convolutional filters weights, these layers would do something, but it wouldn't be quite right since they compute norms over hard-coded dimensions. I'm not sure what the best general solution is to this problem so I wanted to open it up for feedback. The two main options I can think of are:

1) standardize weight matrices so that the norms one would aggregate over would all be in the same place so the constraints are applied properly during updates, but have them reshaped them to proper form for computing the layer output.

2) have the constraints.get function call in each layer also take an argument that determines which dimension(s) are aggregated over to compute the norm (which is obviously ignored if it's not needed i.e. NonNeg).

Thinking this through while writing it, I'm pretty sure 2 is the best option. What so you all think?

fchollet commented 9 years ago

Wouldn't the right dimension always be the last one, though? Except in the case of convolution kernels.

Do constraints make sense on convolution kernels? I don't think I have ever seen that in the literature.

the-moliver commented 9 years ago

It's a problem for the maxout layer as well where you'd like to constrain the norm of the feature weights. And it does make sense to constrain the norm of convolution kernels for the same reason it does fully connected layers, and was actually used in the original Dropout paper "Dropout: A Simple Way to Prevent Neural Networks from Overfitting" in their analysis of Street View House Numbers: "Max-norm regularization was used for weights in both convolutional and fully connected layers."

Also, it is not the last dimension you want to sum over if you want to constrain the weight vector of the incoming weights to each hidden unit and so the UnitNorm constraint is wrong, while the MaxNorm constraint works correctly for dense layers, but not maxout or convolutional layers. You can see how the UnitNorm constraint fails here:

# weight matrix example with 10 hidden units
W = np.random.randn(784,10)
# constrain like UnitNorm
W /= np.sqrt(np.sum(W**2, axis=-1, keepdims=True))
# check norm of weight vector to first hidden unit
np.sqrt(np.sum(W[:,0]**2))
fchollet commented 9 years ago

Ok, I see. We should definitely have the constraint functions accept an argument to configure the norm axis. Then each layer can customize its call to the constraint function with the proper axis.