Closed bitfort closed 4 years ago
More Notes from Google:
This is the code to do dropout_broadcast/noise_shape:
if broadcast_dims:
shape = tf.shape(x)
ndims = len(x.get_shape())
kwargs["noise_shape"] = [
1 if i in broadcast_dims else shape[i] for i in range(ndims)
]
tf.nn.dropout(x, keep_prob, **kwargs)
For convergence data, it doesn't affect the # of epochs to convergence for at least global batch size of 131072 tokens. For larger batch sizes, don't have recent data.
SWG:
Discussed in SWG and no objections noted.
We will clarify in rules that you can tune dropout broadcast.
SWG: We note an a request has been made for additional discussion.
AI(Victor) Schedule meeting
We are withdrawing.
We are seeking clarify to confirm that tuning "noise_shape" for dropout is allowed. https://www.tensorflow.org/api_docs/python/tf/nn/dropout