Closed Hvass-Labs closed 7 years ago
The easiest way to use them is to provide the batch_normalize
argument to conv2d
or fully_connected
(both support setting with defaults_scope as well). This takes a pt.BatchNormalizationArguments
object (which can be shared and I see now that auto-generated documentation is broken here). The defaults here are likely to be good enough for experimentation and the main one you would want to toggle is probably scale_after_normalization
which controls whether a learned multiplier is used.
There is a lot more win with batch normalization in convolutional layers than fully connected, but you may get win there if your batch size is big enough. Also, it is important to note that your train and test networks will be different when using batch normalization, so you will need to construct one with phase=pt.Phase.train
and the other with phase=pt.Phase.test
. This is because during training the actual batch statistics are used and during test, a running average from train are used instead.
Also, I don't mind answering questions here, especially if you close them when you are satisfied.
I've looked at the function conv2d
which takes an argument batch_normalize
which is a BatchNormalizationArguments
object, as you say. Confusingly it is set to False
by default, where I would have thought None
would be more appropriate?
Anyway, when I search for its definition using the PyCharm editor, I end up in the file pretty_tensor_normalization_methods.py
with the following:
BatchNormalizationArguments = collections.namedtuple(
'BatchNormalizationArguments',
('learned_moments_update_rate', 'variance_epsilon',
'scale_after_normalization'))
There's not much documentation in this file so it isn't clear how to use this, what the parameters mean and which values might be appropriate.
Could I ask you to modify the code I wrote above to give an example of how to do this?
Thanks.
The parameters map directly to the batch_normalization method. It is
unlikely that you will want to change variance_epsilon
since it mostly
just avoids problems around 0. learned_moments_update_rate
also has a
reasonable default (it changes the decay factor for the exponential moving
average used in test or inference). They are exposed more for completeness
and to support edge cases where they may be useful. You may find some value
in playing with scale_after_normalization
which controls a multiplier
applied to each depth channel. See https://github.com/google/
prettytensor/blob/master/docs/PrettyTensor.md#batch_normalize for more
details.
norm = pt.,BatchNormalizationArguments(scale_after_normalization=True)
with pt.defaults_scope(activation_fn=tf.nn.relu, phase=pt.Phase.train):
y_pred, loss = x_pretty.\
conv2d(kernel=5, depth=64, name='layer_conv1', batch_normalize=norm).\
max_pool(kernel=2, stride=2).\
conv2d(kernel=5, depth=64, name='layer_conv2', batch_normalize=norm).\
max_pool(kernel=2, stride=2).\
flatten().\
fully_connected(size=256, name='layer_fc1', batch_normalize=norm). \
fully_connected(size=128, name='layer_fc2').\
softmax_classifier(class_count=10, labels=y_true)
On Wed, Aug 17, 2016 at 1:10 AM, Hvass-Labs notifications@github.com wrote:
I've looked at the function conv2d which takes an argument batch_normalize which is a BatchNormalizationArguments object, as you say. Confusingly it is set to False by default, where I would have thought None would be more appropriate?
Anyway, when I search for its definition using the PyCharm editor, I end up in the file pretty_tensor_normalization_methods.py with the following:
BatchNormalizationArguments = collections.namedtuple( 'BatchNormalizationArguments', ('learned_moments_update_rate', 'variance_epsilon', 'scale_after_normalization'))
There's not much documentation in this file so it isn't clear how to use this, what the parameters mean and which values might be appropriate.
Could I ask you to modify the code I wrote above to give an example of how to do this?
Thanks.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/prettytensor/issues/30#issuecomment-240343273, or mute the thread https://github.com/notifications/unsubscribe-auth/ABnmwAt8Yip-rkidrR0jQRuihOzpqi6yks5qgsITgaJpZM4JlSTh .
Thanks for the example. However, I get an error using your sample code.
First I write the following which you suggest:
norm = pt.BatchNormalizationArguments(scale_after_normalization=True)
And in the code for defining the network I have the following line for creating the first conv-layer, as you suggest:
conv2d(kernel=5, depth=64, name='layer_conv1', batch_normalize=norm).\
But this causes the following exception:
UnboundLocalError: local variable 'kwargs' referenced before assignment
which is raised in line 1981 in pretty_tensor_class.py
which reads:
result = func(non_seq_layer, *args, **kwargs)
What I do instead is that I use batch_normalize=True
in the call to conv2d()
. But it's not really clear from the docs what this does.
I've read the following doc which you suggested, but it really doesn't explain much:
https://github.com/google/prettytensor/blob/master/docs/PrettyTensor.md#batch_normalize
The docs also don't make clear what is the difference in using different phases in Pretty Tensor. When I look at the docs for e.g. evaluate_precision_recall()
it appears that it completely changes the semantics of the function in the testing phase, so I probably don't want to use Pretty Tensor's definition of training / testing phases, because it might change the semantics in unpredictable and undocumented ways, which would cause bugs in my code that are very hard to find.
Once again I'd like to encourage you to significantly improve the documentation because it is frustrating to try and learn how to use Pretty Tensor from reading the current docs. Scikit-learn has very good documentation which could serve as inspiration. But I read in the TensorFlow forum that the dev-team is currently consolidating the builder-API's, so perhaps you have different plans going forward?
Thanks for the bug, I found the problem and will provide a fix.
Pretty Tensor is supported and alive, but it is a rather small effort now and so fixes take time.
The larger effort is tf.contrib.learn and it has some tutorials here: https://www.tensorflow.org/versions/r0.9/tutorials/index.html
These can all be mixed and matches and any functions that you like and are missing can be added (the simplest way would be by doing pt.Register(tf.contrib.layers.BLAH)
). I'm taking the documentation request seriously.
Once again I hope it's OK that I ask this question here instead of on StackOverflow.
I don't know if batch-normalization is really useful, there seems to be different opinions on the matter. But I'd like to try it. I can see that it's implemented in Pretty Tensor:
https://github.com/google/prettytensor/blob/master/docs/PrettyTensor.md#batch_normalize
But I can't figure out how to use it for the following Convolutional Neural Network:
Any help would be appreciated.