Closed yuzcccc closed 7 years ago
same question
it's easy to write a train_val.prototxt
from deploy.prototxt
.
you can do it by yourself.
Here is my solver settings, batch size is 256, it is quite easy too.
base_lr: 0.1 lr_policy: "poly" power: 1.0 max_iter: 500000 momentum: 0.9 weight_decay: 0.0001
hi @shicai, have you ever tried another policy for learning rate? thanks
i just trained it once, and no other lr policies used.
Thanks for your wonderful job. I am not sure about the hyper params in your train.proto. The BN layer is as follows: layer { name: "conv1/bn" type: "BatchNorm" bottom: "conv1" top: "conv1" batch_norm_param { eps: 1e-4 } } layer { name: "conv1/scale" type: "Scale" bottom: "conv1" top: "conv1" scale_param { bias_term: true } } Is it right?
It is ok for test stage when using pretrained models. but for training, you should add params to control weight decay and learning rate multipliers.
Thanks for your reply. I wonder if the hyper params of BatchNorm and Scale Layer are default (lr_multi=1.0 and decay_multi=1.0) ? @shicai
for batchnorm layers, lr and wd should be set to 0, since you don't need to learn
mean/var params.
but for scale layers, lr and wd should be set as conv layers.
Thanks. I would like to train from scratch on ImageNet dataset. So I think the params in batchnorm layer need to learn (mean/val/factor etc.). Are the lr_multi and decay_multi set as follows? layer { name: "conv1/bn" type: "BatchNorm" bottom: "conv1" top: "conv1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 1.0 } batch_norm_param { eps: 1e-4 } } layer { name: "conv1/scale" type: "Scale" bottom: "conv1" top: "conv1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } scale_param { bias_term: true } }
you should know that params in batchnorm layer don't need to be learned, they are calculated. just calculate the mean/var values, actually they are not params, so please don't set lr or wd for them.
layer { name: "conv1/bn" type: "BatchNorm" bottom: "conv1" top: "conv1" batch_norm_param { eps: 1e-4 } } layer { name: "conv1/scale" type: "Scale" bottom: "conv1" top: "conv1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } scale_param { bias_term: true } } Is it right?
Is it possible to provide the train & solver prototxt?