Closed anantzoid closed 7 years ago
Hi, there is an infer
variable which is essentially !training
. Only one of the two should be used. I think training
is a better name, so maybe use that instead everywhere?
Thanks for your contribution.
Please check ^
Hi, will this pull request be merged soon?
Using appropriate dropouts will definitely make a more robust model.
@sherjilozair This looks good. Are you going to merge?
@anantzoid, do you think you can send this dropouts PR for word-rnn, https://github.com/hunkim/word-rnn-tensorflow/? I'll merge right away. Thanks in advance.
@anantzoid This PR has merge conflicts.
@hugovk This is resolved.
The latest Tensorflow stuff I tried has options for both input_keep_prob
and output_keep_prob
. So I'll give a read through:
After better understanding dropout, I'll try to add an option for both input and output on the CLI and get this tested and merged. Thanks!
Okay, going to try to get this merged with:
dropout rate
p
as defined below (as a probability [0,1] of keeping the input or outputinput_keep_prob
= "input layers" and output_keep_prob
= "hidden layers" depending on the architecture depth etc, but good enough start for now.https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf:
Dropout introduces an extra hyperparameter—the probability of retaining a unit p. This
hyperparameter controls the intensity of dropout. p = 1, implies no dropout and low values
of p mean more dropout. Typical values of p for hidden units are in the range 0.5 to 0.8.
For input layers, the choice depends on the kind of input. For real-valued inputs (image
patches or speech frames), a typical value is 0.8. For hidden layers, the choice of p is coupled
with the choice of number of hidden units n. Smaller p requires big n which slows down
the training and leads to underfitting. Large p may not produce enough dropout to prevent
overfitting.
@anantzoid , almost got it, the only odd thing I see is this line was changed in the PR:
output = tf.reshape(tf.concat(1, outputs), [-1, args.rnn_size])
Which raises the error:
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.
So I put it back to what the original was, which seems to work and dropout appears in the Tensorboard graph. This is what it was, and still is:
output = tf.reshape(tf.concat(outputs, 1), [-1, args.rnn_size])
If there was a reason for this that I missed, please let me know!
Going to add a quick patch to extend to both input and output dropout CLI args then merge.
Thanks @anantzoid , I could use some code review especially right here. I presumed it should use the output_keep_prob
as that is what you were using originally as keep_prob
.
https://github.com/sherjilozair/char-rnn-tensorflow/blob/master/model.py#L47-L49
34 Implemented Dropouts using Tensorflow's Dropout Wrapper around rnn_cell and dropout function for embedded_input.