openai / supervised-reptile

Code for the paper "On First-Order Meta-Learning Algorithms"
https://arxiv.org/abs/1803.02999
MIT License
989 stars 210 forks source link

About batchnorm #12

Closed jaegerstar closed 6 years ago

jaegerstar commented 6 years ago

Why did you implement it that turned the batchnorm on training mode all the way?

out = tf.layers.batch_normalization(out, training=True)

Shouldn't it be turned off during the test of metatest ?

unixpickle commented 6 years ago

There are several ways to deal with batchnorm at test time. Setting training=False uses rolling moment data, but that is by far not the only option. Also, using batchnorm at test time across a smaller set of samples is likely helpful for distinguishing between those samples.

For example, you can simply feed your network a batch containing the whole mini training set and a single sample from the mini test set. Another thing you can do is feed the entire mini dataset, allowing some information to leak between test samples through batchnorm. If you read the paper closely, you'll see that we use batchnorm in both of these ways. When using transduction, batchnorm is allowed to share info across all the test samples. This is technically not exactly the few shot objective people tend to talk about, but it's what was used in the MAML paper. In general, transduction tends to give a slight performance boost (which makes sense, since it is basically cheating).

jaegerstar commented 6 years ago

Thanks for your reply. Can you provided some reference materials about using batchnorm at test time ? Or it‘s purely empirical ?

unixpickle commented 6 years ago

BatchNorm at test time is usually more of a footnote than a focus. It's subtle and easy to get wrong, but it usually doesn't make enough of an impact to draw attention.