sherjilozair / char-rnn-tensorflow

Multi-layer Recurrent Neural Networks (LSTM, RNN) for character-level language models in Python using Tensorflow
MIT License
2.64k stars 960 forks source link

Dropouts #35

Closed anantzoid closed 7 years ago

anantzoid commented 8 years ago

34 Implemented Dropouts using Tensorflow's Dropout Wrapper around rnn_cell and dropout function for embedded_input.

sherjilozair commented 8 years ago

Hi, there is an infer variable which is essentially !training. Only one of the two should be used. I think training is a better name, so maybe use that instead everywhere?

Thanks for your contribution.

anantzoid commented 8 years ago

Please check ^

weimingchen commented 8 years ago

Hi, will this pull request be merged soon?

Using appropriate dropouts will definitely make a more robust model.

hunkim commented 8 years ago

@sherjilozair This looks good. Are you going to merge?

@anantzoid, do you think you can send this dropouts PR for word-rnn, https://github.com/hunkim/word-rnn-tensorflow/? I'll merge right away. Thanks in advance.

hugovk commented 7 years ago

@anantzoid This PR has merge conflicts.

anantzoid commented 7 years ago

@hugovk This is resolved.

ubergarm commented 7 years ago

The latest Tensorflow stuff I tried has options for both input_keep_prob and output_keep_prob. So I'll give a read through:

  1. https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/DropoutWrapper
  2. https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

After better understanding dropout, I'll try to add an option for both input and output on the CLI and get this tested and merged. Thanks!

ubergarm commented 7 years ago

Okay, going to try to get this merged with:

  1. both input and output keep probabilities available as CLI options
  2. express the dropout rate p as defined below (as a probability [0,1] of keeping the input or output
  3. I don't think its a perfect match to have input_keep_prob = "input layers" and output_keep_prob = "hidden layers" depending on the architecture depth etc, but good enough start for now.
  4. If both of these values are exactly 1.0 then dropout will be disabled and not waste any computations.

Reference

https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf:

Dropout introduces an extra hyperparameter—the probability of retaining a unit p. This
hyperparameter controls the intensity of dropout. p = 1, implies no dropout and low values
of p mean more dropout. Typical values of p for hidden units are in the range 0.5 to 0.8.
For input layers, the choice depends on the kind of input. For real-valued inputs (image
patches or speech frames), a typical value is 0.8. For hidden layers, the choice of p is coupled
with the choice of number of hidden units n. Smaller p requires big n which slows down
the training and leads to underfitting. Large p may not produce enough dropout to prevent
overfitting.
ubergarm commented 7 years ago

@anantzoid , almost got it, the only odd thing I see is this line was changed in the PR:

output = tf.reshape(tf.concat(1, outputs), [-1, args.rnn_size])

Which raises the error:

TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

So I put it back to what the original was, which seems to work and dropout appears in the Tensorboard graph. This is what it was, and still is:

output = tf.reshape(tf.concat(outputs, 1), [-1, args.rnn_size])

If there was a reason for this that I missed, please let me know!

Going to add a quick patch to extend to both input and output dropout CLI args then merge.

ubergarm commented 7 years ago

Thanks @anantzoid , I could use some code review especially right here. I presumed it should use the output_keep_prob as that is what you were using originally as keep_prob.

https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/model.py#L47-L49