The regularization of depthwise convolution

shicai / MobileNet-Caffe

Caffe Implementation of Google's MobileNets (v1 and v2)

BSD 3-Clause "New" or "Revised" License

1.26k stars 707 forks source link

The regularization of depthwise convolution #56

Open BOBrown opened 6 years ago

BOBrown commented 6 years ago

The author wrote following words in paper: Additionally, we found that it was important to put very little or no weight decay (l2 regularization) on the depthwise filters since their are so few parameters in them.

Therefore, i think that we should set decay_mult: 0.0 in the moblienet prototxt

mathmanu commented 6 years ago

Isn't this line taken from the MobilenetV1 paper? I couldn't find any such statement in the MobilenetV2 paper.

I wonder if all parameters are to be decayed in MobileNetV2 training - at-least that's the understanding that I get by looking at the repository's (very few) that provide a training script: eg: https://github.com/Randl/MobileNetV2-pytorch