Clarification that "noise_shape" in Dropout is Tunable

mlcommons / training_policies

Issues related to MLPerf™ training policies, including rules and suggested changes

https://mlcommons.org/en/groups/training

Apache License 2.0

93 stars 66 forks source link

Clarification that "noise_shape" in Dropout is Tunable #298

Closed bitfort closed 4 years ago

bitfort commented 4 years ago

We are seeking clarify to confirm that tuning "noise_shape" for dropout is allowed. https://www.tensorflow.org/api_docs/python/tf/nn/dropout

bitfort commented 4 years ago

More Notes from Google:

This is the code to do dropout_broadcast/noise_shape:

if broadcast_dims:
    shape = tf.shape(x)
    ndims = len(x.get_shape())
    kwargs["noise_shape"] = [
        1 if i in broadcast_dims else shape[i] for i in range(ndims)
    ]
tf.nn.dropout(x, keep_prob, **kwargs)

For convergence data, it doesn't affect the # of epochs to convergence for at least global batch size of 131072 tokens. For larger batch sizes, don't have recent data.

bitfort commented 4 years ago

SWG:

Discussed in SWG and no objections noted.

We will clarify in rules that you can tune dropout broadcast.

bitfort commented 4 years ago

SWG: We note an a request has been made for additional discussion.

bitfort commented 4 years ago

AI(Victor) Schedule meeting

petermattson commented 4 years ago

We are withdrawing.