Open BOBrown opened 6 years ago
Isn't this line taken from the MobilenetV1 paper? I couldn't find any such statement in the MobilenetV2 paper.
I wonder if all parameters are to be decayed in MobileNetV2 training - at-least that's the understanding that I get by looking at the repository's (very few) that provide a training script: eg: https://github.com/Randl/MobileNetV2-pytorch
The author wrote following words in paper: Additionally, we found that it was important to put very little or no weight decay (l2 regularization) on the depthwise filters since their are so few parameters in them.
Therefore, i think that we should set decay_mult: 0.0 in the moblienet prototxt