Closed dereyly closed 7 years ago
If we want the update the conv layer parameters, so we couldn't merge the scale layer into conv layer for training phase. I didn't do the experiment that not merge the batchNorm layer after roi_pooling, but I don't think it will make a big difference. Because the parameters obtained from the Imagenet have very good generalization ability.
@soeaver Thnak your for answer! Can you explain more detailed why "we couldn't merge the scale layer into conv layer for training phase". I can imagine only one reason: Scale layer with lr_rate=0 are constant when we merge with convolution layer with Scale some weights became bigger and it penalties by weight_decay.
If we only merge bn into scale, the feature map can still be normalized by a single scale layer. But we merge both bn and scale into conv layer, and we update the conv layer parameters at training phase that will destroy the normalization parameters (because we only have conv layer now). So, at inference phase we can merge bn and scale into conv layer.
Hello. Great work! In all models you merge BatchNorm layer, but save Scale layer with lr_mult: 0. Why we cant merge Scale layer too. (I try this -- but have poor resault, why?) Maybe BatchNorm layer after roi_pooling is usefull