vidit09 / domaingen

CLIP the Gap CVPR 2023
65 stars 6 forks source link

How to do clip init? #27

Open 1184125805 opened 2 weeks ago

1184125805 commented 2 weeks ago

Could you please share how you applied the CLIP initialization to the ori RN101? Could you also let me know where I can find the corresponding code?

vidit09 commented 2 weeks ago

Hello, So you can see the initialization in this line, which reuses the Clip instance from here. The main file importing the clip models is here.

1184125805 commented 2 weeks ago

Thank you!If I want to use RN50+FPN.yaml with RN50 clip init, maybe I should change "self.backbone.set_backbone_model(self.roi_heads.box_predictor.cls_score.visual_enc)"?

1184125805 commented 2 weeks ago

If I change R101->R50, and R101->R101+FPN ,the clip init can not work, what should I change to the code? example:

BASE_YAML: "COCO-Detection/faster_rcnn_R_101_C4_3x.yaml"

BASE_YAML: "COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml"

vidit09 commented 2 weeks ago

Hello, You'll need to rewrite backbone.py with what is in the original detectron2 fpn and replace with CLIP's ResNet

1184125805 commented 2 weeks ago

image Thank you for answering ! I wonder why only these three parameters get requires_grad = True.If I do not want to split backbone to Va and Vb, should I change this?

vidit09 commented 2 weeks ago

so the reason why we split was to mimic the setting when Faster RCNN is initialized with Imagenet pre-trained weights. The output of layer3 is fed to RPN and layer4 is used after ROIAlign. If you don't want to finetune them then you can freeze them.

1184125805 commented 1 week ago

Does the attention pooling layer not need to freeze parameters when training faster RCNN? Why do we need to train attention pooling?

vidit09 commented 1 week ago

in our ablation, we see improved performance by unfreezing the attention pool layer when we are finetuning layer3, layer4 of ResNet.