How to use batch-normalization?

Hvass-Labs commented 8 years ago

Once again I hope it's OK that I ask this question here instead of on StackOverflow.

I don't know if batch-normalization is really useful, there seems to be different opinions on the matter. But I'd like to try it. I can see that it's implemented in Pretty Tensor:

https://github.com/google/prettytensor/blob/master/docs/PrettyTensor.md#batch_normalize

But I can't figure out how to use it for the following Convolutional Neural Network:

with pt.defaults_scope(activation_fn=tf.nn.relu):
    y_pred, loss = x_pretty.\
        conv2d(kernel=5, depth=64, name='layer_conv1').\
        max_pool(kernel=2, stride=2).\
        conv2d(kernel=5, depth=64, name='layer_conv2').\
        max_pool(kernel=2, stride=2).\
        flatten().\
        fully_connected(size=256, name='layer_fc1').\
        fully_connected(size=128, name='layer_fc2').\
        softmax_classifier(class_count=10, labels=y_true)

Any help would be appreciated.

eiderman commented 8 years ago

The easiest way to use them is to provide the batch_normalize argument to conv2d or fully_connected (both support setting with defaults_scope as well). This takes a pt.BatchNormalizationArguments object (which can be shared and I see now that auto-generated documentation is broken here). The defaults here are likely to be good enough for experimentation and the main one you would want to toggle is probably scale_after_normalization which controls whether a learned multiplier is used.

There is a lot more win with batch normalization in convolutional layers than fully connected, but you may get win there if your batch size is big enough. Also, it is important to note that your train and test networks will be different when using batch normalization, so you will need to construct one with phase=pt.Phase.train and the other with phase=pt.Phase.test. This is because during training the actual batch statistics are used and during test, a running average from train are used instead.

Also, I don't mind answering questions here, especially if you close them when you are satisfied.

Hvass-Labs commented 8 years ago

I've looked at the function conv2d which takes an argument batch_normalize which is a BatchNormalizationArguments object, as you say. Confusingly it is set to False by default, where I would have thought None would be more appropriate?

Anyway, when I search for its definition using the PyCharm editor, I end up in the file pretty_tensor_normalization_methods.py with the following:

BatchNormalizationArguments = collections.namedtuple(
    'BatchNormalizationArguments',
    ('learned_moments_update_rate', 'variance_epsilon',
     'scale_after_normalization'))

There's not much documentation in this file so it isn't clear how to use this, what the parameters mean and which values might be appropriate.

Could I ask you to modify the code I wrote above to give an example of how to do this?

Thanks.

eiderman commented 8 years ago

The parameters map directly to the batch_normalization method. It is unlikely that you will want to change variance_epsilon since it mostly just avoids problems around 0. learned_moments_update_rate also has a reasonable default (it changes the decay factor for the exponential moving average used in test or inference). They are exposed more for completeness and to support edge cases where they may be useful. You may find some value in playing with scale_after_normalization which controls a multiplier applied to each depth channel. See https://github.com/google/ prettytensor/blob/master/docs/PrettyTensor.md#batch_normalize for more details.

norm = pt.,BatchNormalizationArguments(scale_after_normalization=True)

with pt.defaults_scope(activation_fn=tf.nn.relu, phase=pt.Phase.train):
    y_pred, loss = x_pretty.\
        conv2d(kernel=5, depth=64, name='layer_conv1', batch_normalize=norm).\
        max_pool(kernel=2, stride=2).\
        conv2d(kernel=5, depth=64, name='layer_conv2', batch_normalize=norm).\

        max_pool(kernel=2, stride=2).\
        flatten().\
        fully_connected(size=256, name='layer_fc1', batch_normalize=norm). \

        fully_connected(size=128, name='layer_fc2').\
        softmax_classifier(class_count=10, labels=y_true)

On Wed, Aug 17, 2016 at 1:10 AM, Hvass-Labs notifications@github.com wrote:

I've looked at the function conv2d which takes an argument batch_normalize which is a BatchNormalizationArguments object, as you say. Confusingly it is set to False by default, where I would have thought None would be more appropriate?

Anyway, when I search for its definition using the PyCharm editor, I end up in the file pretty_tensor_normalization_methods.py with the following:

BatchNormalizationArguments = collections.namedtuple( 'BatchNormalizationArguments', ('learned_moments_update_rate', 'variance_epsilon', 'scale_after_normalization'))

There's not much documentation in this file so it isn't clear how to use this, what the parameters mean and which values might be appropriate.

Could I ask you to modify the code I wrote above to give an example of how to do this?

Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/prettytensor/issues/30#issuecomment-240343273, or mute the thread https://github.com/notifications/unsubscribe-auth/ABnmwAt8Yip-rkidrR0jQRuihOzpqi6yks5qgsITgaJpZM4JlSTh .

Hvass-Labs commented 8 years ago

Thanks for the example. However, I get an error using your sample code.

First I write the following which you suggest:

norm = pt.BatchNormalizationArguments(scale_after_normalization=True)

And in the code for defining the network I have the following line for creating the first conv-layer, as you suggest:

conv2d(kernel=5, depth=64, name='layer_conv1', batch_normalize=norm).\

But this causes the following exception:

UnboundLocalError: local variable 'kwargs' referenced before assignment

which is raised in line 1981 in pretty_tensor_class.py which reads:

result = func(non_seq_layer, *args, **kwargs)

What I do instead is that I use batch_normalize=True in the call to conv2d(). But it's not really clear from the docs what this does.

I've read the following doc which you suggested, but it really doesn't explain much:

https://github.com/google/prettytensor/blob/master/docs/PrettyTensor.md#batch_normalize

The docs also don't make clear what is the difference in using different phases in Pretty Tensor. When I look at the docs for e.g. evaluate_precision_recall() it appears that it completely changes the semantics of the function in the testing phase, so I probably don't want to use Pretty Tensor's definition of training / testing phases, because it might change the semantics in unpredictable and undocumented ways, which would cause bugs in my code that are very hard to find.

Once again I'd like to encourage you to significantly improve the documentation because it is frustrating to try and learn how to use Pretty Tensor from reading the current docs. Scikit-learn has very good documentation which could serve as inspiration. But I read in the TensorFlow forum that the dev-team is currently consolidating the builder-API's, so perhaps you have different plans going forward?

eiderman commented 8 years ago

Thanks for the bug, I found the problem and will provide a fix.

Pretty Tensor is supported and alive, but it is a rather small effort now and so fixes take time.

The larger effort is tf.contrib.learn and it has some tutorials here: https://www.tensorflow.org/versions/r0.9/tutorials/index.html

These can all be mixed and matches and any functions that you like and are missing can be added (the simplest way would be by doing pt.Register(tf.contrib.layers.BLAH)). I'm taking the documentation request seriously.

google / prettytensor

How to use batch-normalization? #30