soeaver / caffe-model

Caffe models (including classification, detection and segmentation) and deploy files for famouse networks
MIT License
1.28k stars 623 forks source link

Merge scale layer #49

Closed dereyly closed 7 years ago

dereyly commented 7 years ago

Hello. Great work! In all models you merge BatchNorm layer, but save Scale layer with lr_mult: 0. Why we cant merge Scale layer too. (I try this -- but have poor resault, why?) Maybe BatchNorm layer after roi_pooling is usefull

soeaver commented 7 years ago

If we want the update the conv layer parameters, so we couldn't merge the scale layer into conv layer for training phase. I didn't do the experiment that not merge the batchNorm layer after roi_pooling, but I don't think it will make a big difference. Because the parameters obtained from the Imagenet have very good generalization ability.

dereyly commented 7 years ago

@soeaver Thnak your for answer! Can you explain more detailed why "we couldn't merge the scale layer into conv layer for training phase". I can imagine only one reason: Scale layer with lr_rate=0 are constant when we merge with convolution layer with Scale some weights became bigger and it penalties by weight_decay.

soeaver commented 7 years ago

If we only merge bn into scale, the feature map can still be normalized by a single scale layer. But we merge both bn and scale into conv layer, and we update the conv layer parameters at training phase that will destroy the normalization parameters (because we only have conv layer now). So, at inference phase we can merge bn and scale into conv layer.