z-x-yang / GCT

Gated Channel Transformation for Visual Recognition (CVPR 2020)
133 stars 26 forks source link

pretrained model on ImageNet, PascalVOC object detection #4

Closed hezhu1996 closed 4 years ago

hezhu1996 commented 4 years ago

Hi~ I tried to implement GCT on detectron2 to trian Pascal VOC in terms of object detectrion but performance isn't going up. I notice that you said in the paper: "All the backbone models are pre-trained on ImageNet using the scale and aspect ratio augmentation in and fine-tune on COCO with a batch size of 16".

Question: For the pretrained model, did you use the one pretrianed on (ResNet50+GCT) or just original (ResNet50) in ImageNet? cause I use the original pretrained model in imagenet (not with GCT), is it the reason why performance isn't desirable? Is the any skill to fine-tune on COCO dataset? I just want to reimplement GCT to get mAP in your paper...

z-x-yang commented 4 years ago

We used a GCT-ResNet-50 pretrained on ImageNet. To reproduce our performance, you can use our PyTorch code to pretrain the backbone on ImageNet. And then use the pretrained backbone to fine-tune on COCO dataset using maskrcnn-benchmark. You need to replace the backbone file, resnet.py.

Following the default setting in our paper, you need to remove the weight decay on the beta parameters of GCT. In detail, you need to modify the code of maskrcnn-benchmark, here. About how to remove the weight decay on the beta, you can refer to our code here.

hezhu1996 commented 4 years ago

Thank you so much for your detailed explanation! So it means without GCT-ResNet pretrained model, it won’t give a good result in detection task? Because I have only one gpu 2080 and it’s too weak to train in imagenet(Maybe Is it possible to share the pretrained model?), or is there alternative methods or walk around that we can adopt to boost up the performance in detection tasks? like freeze the resnet layers... not sure:(

One more question, when comparing the SENet, did you also use the pretrained SENet as well? I only saw some pretrained SENet-caffe model in github. Are they(caffe model) also can be used in maskrcnn-benchmark? Many thanks~

hezhu1996 commented 4 years ago

In terms of fine-tune model, is it reasonable to freeze all the pretrained weights(origin resnet) and only train GCT layer? It’s really time consuming to train imagenet from scratch... thanks!

z-x-yang commented 4 years ago

The backbone models (SENet or GCT-ResNet) should be pretrained on ImageNet from scratch. When fine-tuning on COCO, we follow the same setting of Mask-RCNN, which is that all the batch normalization and first two Conv layers (if I remember correctly) of the backbone are frozen. More details can be found in maskrcnn-benchmark and its paper. We didn't change the training setting of Mask-RCNN on COCO.

I'm running the PyTorch version of GCT-ResNet-50 on ImageNet, and I'll share it with you when finished.

I haven't tried only to train GCT layers, but I suppose it's not a good idea.

hezhu1996 commented 4 years ago

@z-x-yang Hi, That's a very detailed and clear answer which solve most of my confusion , I also tried SENet and CBAM without pretrained model and both of them doesn't show good performance, I presume that's because only with imageNet pretrianed backbone will have improvement. But I also not sure about one thing, that recently GCNet combines the Non-local and SENet which is able to show a clear improve without any pretrained model(like GC+ResNet), From my perspective, both GC or GCT(SE or CBAM) modify the backbone structure, why only GCNet can be used without pretrianed model? PS. if you see mmdetection, they directly add GCNet and trained on COCO dataset which improves around 2%(in my own experiment, it shows the same result in pascal voc), I just feel confused about how pretrained model will impact the backbone network and whole detection performance. Hope you comment on it, Thanks!

z-x-yang commented 4 years ago

An obvious difference between GCT (SE or CBAM) and GCNet is that GCT uses a gate to adjust the input feature but GCNet uses a residual connection. Maybe the residual connection is better for the gradient descent of additional modules (GCNet) when the learning rate is small, such as fine-tuning.

I've submitted a pretrain model of GCT-ResNet50. You can find it from here.

z-x-yang commented 4 years ago

@TWDH Hello! If you have no more questions, I'll close this issue! Thanks!