Closed bzamecnik closed 6 years ago
This is what is desperately needed. Plus, in layer_CuDNN_LSTM, the dtype = float16 is not enabled for Nvidia's CUDA 9.1 FP16 training.
I guess applying Dropout(x)(inputs)
before the LSTM layer will do the same as the dropout
, right? Or do you think it can cause a slow down?
This was would be tremendously helpful to many, many people. Not being able to use dropout often renders CuDNN layers virtually useless for training smaller datasets.
Recurrent dropout is still not supported with Tensorflow, if you would like to see it please submit the request there. The input dropout can easily be achieved by adding a dropout layers before the CuDNRNN layer manually
@tRosenflanz are you sure? https://github.com/tensorflow/tensorflow/issues/6466#issuecomment-339517889
If I am reading the tensorflow thread right, it says that the dropout they support is applied between layers only and not on the hidden states that get passed from CuDNN cell to CuDNN cell within one layer. The dropout they support is equivalent to adding a dropout layer yourself as far as I understand.
@tRosenflanz Sorry, you're right. http://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#cudnnDropoutDescriptor_t suggests that " Dropout will be applied between layers " this applies to the latest CuDNN. :( ref: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/cudnn_rnn/kernels/cudnn_rnn_ops.cc#L549
Recurrent dropout is not implemented in cuDNN RNN ops. At the cuDNN level. So we can't have it in Keras.
The dropout option in the cuDNN API is not recurrent dropout (unlike what is in Keras), so it is basically useless (regular dropout doesn't work with RNNs).
Actually using such dropout in a stacked RNN will wreck training.
Will time-distributed dropout solve the problem? Something like this:
...
for idx in range(num_layers):
top_layer = idx == num_layers - 1
layer = CuDNNLSTM(..., return_sequences=top_layer)(layer)
if not top_layer:
layer = TimeDistributed(Dropout(dropout))(layer)
...
Oh, it looks like even simpler due to documentation: https://keras.io/layers/core/#dropout
noise_shape: 1D integer tensor representing the shape of the binary dropout mask that will be multiplied with the input. For instance, if your inputs have shape
(batch_size, timesteps, features)
and you want the dropout mask to be the same for all timesteps, you can usenoise_shape=(batch_size, 1, features)
.
This will not produce the recurrent dropout. It will apply dropout between layers of the network while recurrent dropout works on the states that are passed within the same layer. Since CuDNN layer works through calling CuDNN layer and doesn't rely on the cells implementation, Keras team cannot do anything to it
@fchollet can you elaborate on these comments:
regular dropout doesn't work with RNNs
Actually using such dropout in a stacked RNN will wreck training.
Any updates on this problem? The built-in dropout truly wreck training.
This paper mentions using DropConnect (Dropout applied to the weights, instead of the state vector) on the recurrent weights in an LSTM in order to have some dropout without changing the cuDNN implementation. They say that for each batch in training they perform dropout on the weights before the forward and backward propagation, and repeat for the next batch. From the paper:
We propose the use of DropConnect (Wan et al., 2013) on the recurrent hidden to hidden weight matrices which does not require any modifications to an RNN’s formu- lation. As the dropout operation is applied once to the weight matrices, before the forward and backward pass, the impact on training speed is minimal and any standard RNN implementation can be used, including inflexible but highly optimized black box LSTM implementations such as NVIDIA’s cuDNN LSTM.
By performing DropConnect on the hidden-to-hidden weight matrices
[Ui,Uf,Uo,Uc]
within the LSTM, we can prevent overfitting from occurring on the recurrent connections of the LSTM. This regularization technique would also be applicable to preventing overfitting on the recurrent weight matrices of other RNN cells.
Is there any interest in implementing this as an option? I am not totally familiar with how dropout is applied in the Model
and Sequential
classes, but hopefully this would not be too hard to implement.
@rsmith49 You can use the TensorLayer implementation [1] of DropConnect directly on Keras. There's an example where you can interchange Keras & TensorLayer together [2].
[1] http://tensorlayer.readthedocs.io/en/latest/modules/layers.html#dropconnect-dense-layer [2] https://github.com/tensorlayer/tensorlayer/blob/master/example/tutorial_keras.py
@fchollet when you say that:
Actually using such dropout in a stacked RNN will wreck training.
Do you refer to this paper?
@brunoalano Do you know of any implementations of DropConnect applied to an LSTM layer? The link you provided only has DropconnectDenseLayer
, and I did not find any in TensorLayer's recurrent.py
.
untested implementation: https://github.com/andry9454/KerasDropconnect
^^ That implementation does not seem right..
+1, would find useful
any progress?
+1
+1
+1
+1
This paper mentions using DropConnect (Dropout applied to the weights, instead of the state vector) on the recurrent weights in an LSTM in order to have some dropout without changing the cuDNN implementation. They say that for each batch in training they perform dropout on the weights before the forward and backward propagation, and repeat for the next batch. From the paper:
We propose the use of DropConnect (Wan et al., 2013) on the recurrent hidden to hidden weight matrices which does not require any modifications to an RNN’s formu- lation. As the dropout operation is applied once to the weight matrices, before the forward and backward pass, the impact on training speed is minimal and any standard RNN implementation can be used, including inflexible but highly optimized black box LSTM implementations such as NVIDIA’s cuDNN LSTM.
By performing DropConnect on the hidden-to-hidden weight matrices
[Ui,Uf,Uo,Uc]
within the LSTM, we can prevent overfitting from occurring on the recurrent connections of the LSTM. This regularization technique would also be applicable to preventing overfitting on the recurrent weight matrices of other RNN cells.Is there any interest in implementing this as an option? I am not totally familiar with how dropout is applied in the
Model
andSequential
classes, but hopefully this would not be too hard to implement.
I would like to ask if there are further updates regarding this (dropConnect on the recurrent connection)? I tried to implement a custom recurrent_regularizer that calls tf.nn.dropout on the hidden to hidden weights but I don't think it is working properly. The returned loss for some reason is an array of size(sequence_length,sequence_length*4).
Haste will be helpful to implement.
@icoxfog417 Thanks for linking, they appear way ahead of TensorFlow on this.
Is it possible to use this https://github.com/lmnt-com/haste in windows 10 with tensorflow-gpu 2.0?
Any progress or updates regarding implementing recurrent_dropout
in tensorflow.keras
?
I have a performance problem as a result of this as well (detailed here: https://github.com/tensorflow/tensorflow/issues/40944)
Native Keras GRU and LSTM layers support
dropout
andrecurrent_dropout
, but their CuDNN-accelerated counterparts, CuDNNLSTM and CuDNNGRU, do not. It might be good to add these features. Although CuDNN RNNs do not support dropout natively, it seems to be possible to implement it outside of CuDNN. At least TensorFlow is capable of that. In Keras dropout can be applied either on inputs (dropout
), which should be straightforward, or on previous hidden state (recurrent_dropout
). I'm not sure if the latter might be possible, tough.The reason is using CuDNN RNN implementation for fast training and allow dropout regularization at the same time.
Please comment if this makes sense or it is wanted. I'd be happy to try implementing that. Thanks.