Add regularization to recurrent layers

discobot commented 8 years ago

As described in http://arxiv.org/pdf/1409.2329.pdf

lemuriandezapada commented 8 years ago

I second this. Currently not even l1 or l2 regularization work with recurrent layers

BiaDarkia commented 8 years ago

I also think that regularization for recurrent layers would be an important feature to add to keras. If there is interest in adding regularization to recurrent layers, I can look into the issue and try to implement it.

lemuriandezapada commented 8 years ago

Definitely. I'd do it but I lack the skills

BiaDarkia commented 8 years ago

Alright, I will probably need a couple of weeks for this, but I will be working on it.

edmondja commented 8 years ago

Unfortunately it's not explained precisely, but if im not wrong, it seems that you can already do that by adding a regularization "layer" : https://groups.google.com/forum/#!topic/keras-users/3dCCkyzdHA4

lemuriandezapada commented 8 years ago

That thread is a year old Last time I tried it like that it didn't work

BiaDarkia commented 8 years ago

Adding the feature dropout to a model with model.add(Dropout) does work. However, the solution for dropout presented in the linked paper suggests to not apply dropout to h_t-1, if it affects c_t. This means that dropout is only applied to h_t-1 for calculating the output / h_t. I looked through the code, and I think that such a feature can only be added by adding an additional paramter to LSTMs. I already started implementing the feature, but I still have to test the implementation.

BiaDarkia commented 8 years ago

Unfortunately, my approach to implementing the feature did not work out. I tried to implement dropout for h_tm1 only in the model specification of LSTM. To implement dropout it is necessary to use a random number generator giving h_tm1.shape as an argument. However, once the model is compiled, the tensor flow object h_tm1 does still not have any concrete value, meaning that h_tm1 shape cannot be evaluated.

You can find my approach to implementing the feature and the resulting error message below: `class LSTM(Recurrent): '''Long-Short Term Memory unit - Hochreiter 1997.

For a step-by-step description of the algorithm, see
[this tutorial](http://deeplearning.net/tutorial/lstm.html).

# Arguments
    output_dim: dimension of the internal projections and the final output.
    init: weight initialization function.
        Can be the name of an existing function (str),
        or a Theano function (see: [initializations](../initializations.md)).
    inner_init: initialization function of the inner cells.
    forget_bias_init: initialization function for the bias of the forget gate.
        [Jozefowicz et al.](http://www.jmlr.org/proceedings/papers/v37/jozefowicz15.pdf)
        recommend initializing with ones.
    activation: activation function.
        Can be the name of an existing function (str),
        or a Theano function (see: [activations](../activations.md)).
    inner_activation: activation function for the inner cells.

# References
    - [Long short-term memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf) (original 1997 paper)
    - [Learning to forget: Continual prediction with LSTM](http://www.mitpressjournals.org/doi/pdf/10.1162/089976600300015015)
    - [Supervised sequence labelling with recurrent neural networks](http://www.cs.toronto.edu/~graves/preprint.pdf)
'''
def __init__(self, output_dim,
             init='glorot_uniform', inner_init='orthogonal',
             forget_bias_init='one', activation='tanh',
             inner_activation='hard_sigmoid', dropout=None,
             **kwargs):
    self.output_dim = output_dim
    self.init = initializations.get(init)
    self.inner_init = initializations.get(inner_init)
    self.forget_bias_init = initializations.get(forget_bias_init)
    self.activation = activations.get(activation)
    self.inner_activation = activations.get(inner_activation)
    self.dropout = dropout
    super(LSTM, self).__init__(**kwargs)

def build(self):
    input_shape = self.input_shape
    input_dim = input_shape[2]
    self.input_dim = input_dim
    self.input = K.placeholder(input_shape)

    if self.stateful:
        self.reset_states()
    else:
        # initial states: 2 all-zero tensor of shape (output_dim)
        self.states = [None, None]

    self.W_i = self.init((input_dim, self.output_dim))
    self.U_i = self.inner_init((self.output_dim, self.output_dim))
    self.b_i = K.zeros((self.output_dim,))

    self.W_f = self.init((input_dim, self.output_dim))
    self.U_f = self.inner_init((self.output_dim, self.output_dim))
    self.b_f = self.forget_bias_init((self.output_dim,))

    self.W_c = self.init((input_dim, self.output_dim))
    self.U_c = self.inner_init((self.output_dim, self.output_dim))
    self.b_c = K.zeros((self.output_dim,))

    self.W_o = self.init((input_dim, self.output_dim))
    self.U_o = self.inner_init((self.output_dim, self.output_dim))
    self.b_o = K.zeros((self.output_dim,))

    self.params = [self.W_i, self.U_i, self.b_i,
                   self.W_c, self.U_c, self.b_c,
                   self.W_f, self.U_f, self.b_f,
                   self.W_o, self.U_o, self.b_o]

    if self.initial_weights is not None:
        self.set_weights(self.initial_weights)
        del self.initial_weights

def reset_states(self):
    assert self.stateful, 'Layer must be stateful.'
    input_shape = self.input_shape
    if not input_shape[0]:
        raise Exception('If a RNN is stateful, a complete ' +
                        'input_shape must be provided ' +
                        '(including batch size).')
    if hasattr(self, 'states'):
        K.set_value(self.states[0],
                    np.zeros((input_shape[0], self.output_dim)))
        K.set_value(self.states[1],
                    np.zeros((input_shape[0], self.output_dim)))
    else:
        self.states = [K.zeros((input_shape[0], self.output_dim)),
                       K.zeros((input_shape[0], self.output_dim))]

def step(self, x, states):
    assert len(states) == 2
    h_tm1 = states[0]
    c_tm1 = states[1]

    x_i = K.dot(x, self.W_i) + self.b_i
    x_f = K.dot(x, self.W_f) + self.b_f
    x_c = K.dot(x, self.W_c) + self.b_c
    x_o = K.dot(x, self.W_o) + self.b_o

    if self.dropout:
        h_tm1_dropout = K.dropout(x=h_tm1, level=self.dropout)
        i = self.inner_activation(x_i + K.dot(h_tm1, self.U_i))
        f = self.inner_activation(x_f + K.dot(h_tm1, self.U_f))
        c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1, self.U_c))
        o = self.inner_activation(x_o + K.dot(h_tm1_dropout, self.U_o))
        h = o * self.activation(c)

    else:
        i = self.inner_activation(x_i + K.dot(h_tm1, self.U_i))
        f = self.inner_activation(x_f + K.dot(h_tm1, self.U_f))
        c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1, self.U_c))
        o = self.inner_activation(x_o + K.dot(h_tm1, self.U_o))
        h = o * self.activation(c)

    return h, [h, c]

def get_config(self):
    config = {"output_dim": self.output_dim,
              "init": self.init.__name__,
              "inner_init": self.inner_init.__name__,
              "forget_bias_init": self.forget_bias_init.__name__,
              "activation": self.activation.__name__,
              "inner_activation": self.inner_activation.__name__}
    base_config = super(LSTM, self).get_config()
    return dict(list(base_config.items()) + list(config.items()))`

Traceback (most recent call last): File "C:/Users/Bianca/ownCloud/Programming Projects/Python/Dropout_RNNs/dropout/test.py", line 53, in <module> class_mode="binary") File "C:\Users\Bianca\ownCloud\Programming Projects\Python\Dropout_RNNs\dropout\keras\keras\models.py", line 480, in compile self._train = K.function(train_ins, [train_loss], updates=updates) File "C:\Users\Bianca\ownCloud\Programming Projects\Python\Dropout_RNNs\dropout\keras\keras\backend\theano_backend.py", line 398, in function return Function(inputs, outputs, updates=updates) File "C:\Users\Bianca\ownCloud\Programming Projects\Python\Dropout_RNNs\dropout\keras\keras\backend\theano_backend.py", line 390, in __init__ allow_input_downcast=True, **kwargs) File "C:\Program Files (x86)\Python35-32\lib\site-packages\theano-0.8.0.dev0-py3.5.egg\theano\compile\function.py", line 317, in function output_keys=output_keys) File "C:\Program Files (x86)\Python35-32\lib\site-packages\theano-0.8.0.dev0-py3.5.egg\theano\compile\pfunc.py", line 461, in pfunc output_keys=output_keys) File "C:\Program Files (x86)\Python35-32\lib\site-packages\theano-0.8.0.dev0-py3.5.egg\theano\compile\function_module.py", line 1771, in orig_function output_keys=output_keys).create( File "C:\Program Files (x86)\Python35-32\lib\site-packages\theano-0.8.0.dev0-py3.5.egg\theano\compile\function_module.py", line 1423, in __init__ accept_inplace) File "C:\Program Files (x86)\Python35-32\lib\site-packages\theano-0.8.0.dev0-py3.5.egg\theano\compile\function_module.py", line 177, in std_fgraph update_mapping=update_mapping) File "C:\Program Files (x86)\Python35-32\lib\site-packages\theano-0.8.0.dev0-py3.5.egg\theano\gof\fg.py", line 171, in __init__ self.__import_r__(output, reason="init") File "C:\Program Files (x86)\Python35-32\lib\site-packages\theano-0.8.0.dev0-py3.5.egg\theano\gof\fg.py", line 363, in __import_r__ self.__import__(variable.owner, reason=reason) File "C:\Program Files (x86)\Python35-32\lib\site-packages\theano-0.8.0.dev0-py3.5.egg\theano\gof\fg.py", line 477, in __import__ r) theano.gof.fg.MissingInputError: ("An input of the graph, used to compute Shape(<TensorType(float32, matrix)>), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", <TensorType(float32, matrix)>)

It may be possible to implement the feature within the get_input / get_output functions, but since it requires to apply dropout to one variable (h_tm1) that is specified and manipulated in LSTM.step, I am not sure. I will keep looking into it, but the issue will move further down on my priority list. If anyone else has any ideas for different approaches, feel free to let me know or to try them out yourself.

sallamander commented 8 years ago

Don't know where y'all landed on this... but have you seen the dropout_W and dropout_U parameters added to all recurrent layers?

https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L296

Tachyon5 commented 7 years ago

Any update on this issue?

PhABC commented 7 years ago

To echo what @sallamander said, you can now add regularizer as arguments to LSTM cells;

E.g. LSTM(100, input_shape = (T, P), W_regularizer = l2(0.01)))

The default regularizer arguments are ; W_regularizer=None, U_regularizer=None, b_regularizer=None, dropout_W=0.0, dropout_U=0.0

caugusta commented 7 years ago

Who can add this to the Keras docs? Can you also use l1 and l1_l2 (elastic)? I'm getting an error saying l1 not found....

EDIT: Needed this: from keras.regularizers import l1, l2

But still can't get l1_l2

stale[bot] commented 7 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

keras-team / keras

Add regularization to recurrent layers #1498