shicai / SENet-Caffe

A Caffe Re-Implementation of SENet
BSD 3-Clause "New" or "Revised" License
169 stars 101 forks source link

train & solver prototxt #1

Closed yuzcccc closed 7 years ago

yuzcccc commented 7 years ago

Is it possible to provide the train & solver prototxt?

kli-casia commented 7 years ago

same question

shicai commented 7 years ago

it's easy to write a train_val.prototxt from deploy.prototxt. you can do it by yourself. Here is my solver settings, batch size is 256, it is quite easy too.

base_lr: 0.1 lr_policy: "poly" power: 1.0 max_iter: 500000 momentum: 0.9 weight_decay: 0.0001

zimenglan-sysu-512 commented 7 years ago

hi @shicai, have you ever tried another policy for learning rate? thanks

shicai commented 7 years ago

i just trained it once, and no other lr policies used.

wlw208dzy commented 7 years ago

Thanks for your wonderful job. I am not sure about the hyper params in your train.proto. The BN layer is as follows: layer { name: "conv1/bn" type: "BatchNorm" bottom: "conv1" top: "conv1" batch_norm_param { eps: 1e-4 } } layer { name: "conv1/scale" type: "Scale" bottom: "conv1" top: "conv1" scale_param { bias_term: true } } Is it right?

shicai commented 7 years ago

It is ok for test stage when using pretrained models. but for training, you should add params to control weight decay and learning rate multipliers.

wlw208dzy commented 7 years ago

Thanks for your reply. I wonder if the hyper params of BatchNorm and Scale Layer are default (lr_multi=1.0 and decay_multi=1.0) ? @shicai

shicai commented 7 years ago

for batchnorm layers, lr and wd should be set to 0, since you don't need to learn mean/var params. but for scale layers, lr and wd should be set as conv layers.

wlw208dzy commented 7 years ago

Thanks. I would like to train from scratch on ImageNet dataset. So I think the params in batchnorm layer need to learn (mean/val/factor etc.). Are the lr_multi and decay_multi set as follows? layer { name: "conv1/bn" type: "BatchNorm" bottom: "conv1" top: "conv1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 1.0 decay_mult: 1.0 } batch_norm_param { eps: 1e-4 } } layer { name: "conv1/scale" type: "Scale" bottom: "conv1" top: "conv1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } scale_param { bias_term: true } }

shicai commented 7 years ago

you should know that params in batchnorm layer don't need to be learned, they are calculated. just calculate the mean/var values, actually they are not params, so please don't set lr or wd for them.

wlw208dzy commented 7 years ago

layer { name: "conv1/bn" type: "BatchNorm" bottom: "conv1" top: "conv1" batch_norm_param { eps: 1e-4 } } layer { name: "conv1/scale" type: "Scale" bottom: "conv1" top: "conv1" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0 } scale_param { bias_term: true } } Is it right?