Unstable performance between adjacent epochs during test phase

vacancy / Synchronized-BatchNorm-PyTorch

Synchronized Batch Normalization implementation in PyTorch.

MIT License

1.5k stars 189 forks source link

Unstable performance between adjacent epochs during test phase #5

Closed 7color94 closed 6 years ago

7color94 commented 6 years ago

Hi, thanks very much for your code.

But, trained with sync-bn layer, my model seems unstable between adjacent epochs during test phase, i.e., on the test dataset, I tested my model every two epochs, and performance curve present to vibrate seriously.

PS. I used the momentum of 0.1 by default, does it seem too large ?

vacancy commented 6 years ago

The momentum can be the reason for your unstable training. Could you please try some smaller values such as 1e-2 or 1e-3? I am not sure about your dataset and/or your application, but I think it's worth trying.

vacancy commented 6 years ago

Close for now. Please feel free to reopen it if you still have questions.

lyakaap commented 6 years ago

Hi,

I encountered the same issue. In my case, to change momentum default value 0.001 to 0.1 helps for me. I wonder why this implementation has such a low value for default momentum? This might cause the issue when entire num of training iteration is not much.

ref: https://discuss.pytorch.org/t/model-eval-gives-incorrect-loss-for-model-with-batchnorm-layers/7561/6

vacancy commented 6 years ago

Hi @lyakaap

Thanks for your information. I think the choice of the momentum is sometimes a tricky issue, depending on your task/model, and it will be extremely helpful for others if you could provide some details on your model / how you choose the momentum.

In this repo, I use the momentum=0.1 as the default value, which is consistent with the default setting as in PyTorch's BatchNorm.

lyakaap commented 6 years ago

Thanks for reply.

My target dataset is cityscapes and its distribution is far from imagenet, which pretrained model trained. So such a low momentum rate, 0.001 harm estimating cityscapes stats.

In this repo, I use the momentum=0.1 as the default value, which is consistent with the default setting as in PyTorch's BatchNorm.

Sorry, I confused this repo's settings between other bn repo's (I forgot repo name..). momentum=0.1 gave rise to no problems, so it is okay to follow pytorch official setting.

vacancy commented 6 years ago

Thank you so much for the information! @lyakaap