wanghao9610 / OV-DINO

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
https://wanghao9610.github.io/OV-DINO
Apache License 2.0
240 stars 13 forks source link

Are there any differences between pre-training and fine-tuning? #17

Closed kaijinz0228 closed 2 months ago

kaijinz0228 commented 2 months ago

Dear authors,

I tried to reproduce the pre-training of OV-DINO on Objects365 using the fine-tuning code, but the training process seemed to be somehow abnormal. After 6 epochs, i got 0.371 mAP zero-shot on coco, which is quite far away from the result reported in the paper (0.495). I used the exact same pre-training config mentioned in the paper, except the batch size (i use 48 vs 128 used in paper). I wonder whether there are differences between the pre-training and fine-tuning process? If possible, could you make your pre-training logs available?

Thank you

wanghao9610 commented 2 months ago

Thanks for your attention to our work. The O365 pre-tranining pipeline is almost same with the COCO finetuning pipeline, you can reproduce the results follow the COCO pipeline in theory. There may existe some bugs in your pre-training, I will update the O365 pre-training code and log in this week.

kaijinz0228 commented 2 months ago

Thank you for your response and code sharing.

wanghao9610 commented 2 months ago

@kaijinz0228 Hi, I have released the O365 pre-training code and log, you could re-pull the repo and try to reproduce our resutls. If you have any problem, feel free to raise an isseue again. PLEASE NOTE: The batch size is difference between the code and paper, the batch size for O365 pre-training is 64 on 16 GPUs as for the small dataset size. The batch size for [O365, GoldG] and [O365, GoldG, CC1M‡] pre-training are 128 on 32 GPUs, it keeps consistent with the paper reported. I will release the code and their log after our paper is accepted.