microsoft / esvit

EsViT: Efficient self-supervised Vision Transformers
MIT License
408 stars 46 forks source link

Questions about downstream COCO detection #11

Closed actuy closed 3 years ago

actuy commented 3 years ago

Hi, I’m wondering if you can provide a recipe to reproduce the results of CoCo detection? I’ve tried to use your pre-trained checkpoint to train the downstream task with Mask R-CNN, but cannot get the results reported in the paper. Not sure if there was something wrong during the training. Could you please provide more details? Thank you!

ChunyuanLI commented 3 years ago

It employs 3x schedule, as in this config file. But we sweep over different drop_path_rate, and weight_decay. The drop_path_rate has a large impact on the results.

- name: drop_path_rate
  spec: discrete
  values: [0.0,0.1,0.2]    
- name: weight_decay
  spec: discrete
  values: [0.05] 
actuy commented 3 years ago

Thank you! I will try it again with your configuration. Btw, which part of the parameters did you use as the feature extractor in downstream tasks? teacher or student

ChunyuanLI commented 3 years ago

teacher

rgbd-zml commented 2 years ago

Hi, I’m wondering if you can provide a recipe to reproduce the results of CoCo detection? I’ve tried to use your pre-trained checkpoint to train the downstream task with Mask R-CNN, but cannot get the results reported in the paper. Not sure if there was something wrong during the training. Could you please provide more details? Thank you!

hi, Could you share how to load the pre-trained model? When I load the model, there are the following errors :unexpected key in source state_dict: student, teacher, optimizer, epoch, args, dino_loss, fp16_scaler