switchablenorms / Switchable-Normalization

Code for Switchable Normalization from "Differentiable Learning-to-Normalize via Switchable Normalization", https://arxiv.org/abs/1806.10779
867 stars 132 forks source link

Could you share the resnet-101 model pretrained on Imagenet? #4

Closed PkuRainBow closed 6 years ago

PkuRainBow commented 6 years ago

Really great work!

I am wondering when it would be convenient for you to share the pretrained model with resnet-101.

Thanks!

pluo911 commented 6 years ago

We'll release this model. Also welcome to contribute this model.

wanghan0501 commented 6 years ago

Thanks for your contribution! I had tried it and it's amazing. I'm looking forward to your 'resnetv2sn18', 'resnetv2sn34', 'resnetv2sn101' pretrained model.

PkuRainBow commented 6 years ago

@pluo911 I have tried to use the ResNet50v2+SN model for semantic segmentation tasks. The performance is bad.... Should we use different learning rate and weight decay with SN module?

pluo911 commented 6 years ago

@PkuRainBow We didn't tune lr and weight decay in SN. Pay attention to the importance weight of BN in SN. If it is large (say >0.3), you may try batch average. SN works well when minibatch size >= 2. If minibatch size is 1, you have to remove BN in SN (don't inject random noise in test) because BN is the same as IN in training. We'll release the implementations of COCO detection and segmentation very soon for your reference.

PkuRainBow commented 6 years ago

@pluo911 I am using the deeplabv3 framework. The batch size is 8 with 4 P100 GPU. I just use the imagenet pretrained ResNet50v2+SN to tune on the Cityscapes dataset. The original BN support multi-gpu sync, I am wondering whether your methods need to support multi-gpu sync?

HiKapok commented 6 years ago

@pluo911 Thanks for your contribution. Could you please give more details about how to remove BN when I use mini-batch of 1? Should I remove BN directly in both training and testing or only remove BN in testting? Should the importance weights of BN be set to zero when BN is removed?

pluo911 commented 6 years ago

@HiKapok We have released the code and models for object detection. You may refer to that. Finetuning BN is unstable when the task has small batchsize. You may either reduce the importance weights of BN, or just remove it in training.

pluo911 commented 6 years ago

@PkuRainBow We didn't sync BN across GPUs.

HiKapok commented 6 years ago

@pluo911 Model trained using GN and trained using SN without BN, which one is better? Did you try this?

switchablenorms commented 6 years ago

@HiKapok SN without BN is still better than GN. You can refer to the switchable normalization detection repo.

HiKapok commented 6 years ago

@switchablenorms while in your paper, SN is worser than GN for setting (8,1)

pluo911 commented 6 years ago

@HiKapok SN achieves better results in COCO when the pretrained SN model is not as good as GN in ImageNet. The performance on ImageNet could be a good indicator in finetuning. But it is just empirical experience not scientific knowledge. There are too many other factors in the target task that could change results.

HiKapok commented 6 years ago

@pluo911 I'm sorry for that I didn't read the results in SwitchNorm_Detection. Thanks for your contribution. BTW, did you compare their performance when fine-tuning with single image per GPU? I have noticed that you reported the performance of 2im/GPU, how about 1im/GPU, I think this is more important in some application

pluo911 commented 6 years ago

@HiKapok Read GN’s model zoo.

Since we have released ResNet101 pretrained with SN and there is no more discussion related to the model. This issue should be closed.

PkuRainBow commented 6 years ago

I have tried your newly released ResNet101v1+SN for segmentation tasks on Cityscapes. The performance is very poor compared with the original ResNet101. Thus I guess that your method does not generalize well to the semantic segmentation tasks.

pluo911 commented 6 years ago

@PkuRainBow Thanks for your help to evaluate SN in Cityscapes. From your previous comments, I understand that you used a batch size setting of (4,2). When you use a pretrained model of (8,32) and finetune it with (4,2), the weights of BN in SN should be reduced (there're several ways), otherwise the results could be not as good as freezing BN. I feel sorry that I cannot directly tell you how to do that at this moment (due to non-technical issues -.-' ). But we'll release applications of SN including Cityscapes very soon.