Closed Yingdong-Hu closed 2 years ago
Hi, @Alxead ! Thanks for your interest in our work. Q1: In our implementation, if a weight is frozen in pre-training (i.e., position embeddings in MAE & MoCoV3, patch embed in MoCoV3), we also freeze it in our COCO fine-tuning process. The impact on final performance is still unclear because we haven't conducted experiments here. Q2 & Q3: We haven't try BEiT's initialization and also BEiT's interpolation method, but in our experiments, simple linear interpolation for relative position bias works well on SimMIM pre-trained weight (it obtains 48.7 Bbox AP with 25 epoch training). Here're our configurations for SimMIM pre-trained weight, hope these can help you!
model.backbone.bottom_up.vit.stop_grad_conv1 = False
model.backbone.bottom_up.vit.sincos_pos_embed = False
model.backbone.bottom_up.vit.init_values = 0.1 # simmim initialized model with layerscale
model.backbone.bottom_up.vit.beit_qkv_bias = True
optimizer.weight_decay = 0.1
optimizer.lr = 8e-5
Maybe we will conduct some supplementary experiments to find out the effects of learnable pos embed and BEiT's interpolation method recently. If you have any conclusion in your experiments, please let me know and hope to discuss with you.
I believe the issue at hand was addressed, as such I'm closing this. Feel free to ask if you have further questions.
Hi, I have some detail questions about Benchmarking-ViT-B to ask.
Absolute position embeddings You use
sincos_pos_embed=True
and freeze the embedding. In the original paper, the authors transfer the pre-trained absolute position embeddings (actually sincos embedding) for MAE and randomly initialize the absolute position embeddings for BEiT. And both of them seems trainable. Why use freezed absolute position embeddings here?Relative position biases BEiT uses relative position biases during pre-training. If the linear interpolation is used to adopt the relative position biases to higher resolution, the performance significantly degraded on semantic segmentation task, they use a more sophisticated interpolation algorithm. This code just uses linear interpolation, does this affect performance on detection task?https://github.com/hustvl/MIMDet/blob/9e1dea10fd5eb26567cb2bac51f2b652d81620b9/models/benchmarking.py#L547
The config to use BEiT initialization If I were to use BEiT, how to modify the config file? What I am sure about is to modify
init_values=0.1
,beit_qkv_bias=True
. But I'm not sure ifsincos_pos_embed=False
And how to resize relative position biases to higher resolution, if only linear interpolation is used, will it degrade the performance? Is there anything else that needs to be modified in the config file?