Why don't use sigmoid in the switch

joe-siyuan-qiao / DetectoRS

DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution

Apache License 2.0

1.13k stars 178 forks source link

Why don't use sigmoid in the switch #67

Open XinZhangRadar opened 4 years ago

XinZhangRadar commented 4 years ago

I have checked the paper and code , in the SAC module, the switch function is a 1x1 conv layer, when you get the switch feature map, you don't use the sigmoid function to activate the feature, but you set the weight value as '1-switch' and 'switch', if you don't use sigmoid, why compute '1-switch'? you can set the weight value as '-switch' and 'switch' directly.

joe-siyuan-qiao commented 4 years ago

Thanks for the great question. We have tried Sigmoid, but observed slight performance decrease. The problem was that we it was hard to initialize the weight before Sigmoid to make the models loaded from the pre-trained checkpoints have the same outputs with the models without SAC. For example, '-switch' and 'switch' will break the pre-trained models. So we chose to not have Sigmoid, despite that the model is less interpretable.

XinZhangRadar commented 4 years ago

Thanks for reply, the idea of "w+delt w" is cool~